Sunil Pai - The Cloudflare Blog

Project Think: building the next generation of AI agents on Cloudflare

Sunil Pai — Wed, 15 Apr 2026 13:01:00 GMT

Today, we're introducing Project Think: the next generation of the Agents SDK. Project Think is a set of new primitives for building long-running agents (durable execution, sub-agents, sandboxed code execution, persistent sessions) and an opinionated base class that wires them all together. Use the primitives to build exactly what you need, or use the base class to get started fast.

Something happened earlier this year that changed how we think about AI. Tools like Pi, OpenClaw, Claude Code, and Codex proved a simple but powerful idea: give an LLM the ability to read files, write code, execute it, and remember what it learned, and you get something that looks less like a developer tool and more like a general-purpose assistant.

These coding agents aren't just writing code anymore. People are using them to manage calendars, analyze datasets, negotiate purchases, file taxes, and automate entire business workflows. The pattern is always the same: the agent reads context, reasons about it, writes code to take action, observes the result, and iterates. Code is the universal medium of action.

Our team has been using these coding agents every day. And we kept running into the same walls:

They only run on your laptop or an expensive VPS: there's no sharing, no collaboration, no handoff between devices.
They're expensive when idle: a fixed monthly cost whether the agent is working or not. Scale that to a team, or a company, and it adds up fast.
They require management and manual setup: installing dependencies, managing updates, configuring identity and secrets.

And there's a deeper structural issue. Traditional applications serve many users from one instance. As mentioned in our Welcome to Agents Week post, agents are one-to-one. Each agent is a unique instance, serving one user, running one task. A restaurant has a menu and a kitchen optimized to churn out dishes at volume. An agent is more like a personal chef: different ingredients, different techniques, different tools every time.

That fundamentally changes the scaling math. If a hundred million knowledge workers each use an agentic assistant at even modest concurrency, you need capacity for tens of millions of simultaneous sessions. At current per-container costs, that's unsustainable. We need a different foundation.

That's what we've been building.

Introducing Project Think

Project Think ships a set of new primitives for the Agents SDK:

Durable execution with fibers: crash recovery, checkpointing, automatic keepalive
Sub-agents: isolated child agents with their own SQLite and typed RPC
Persistent sessions: tree-structured messages, forking, compaction, full-text search
Sandboxed code execution: Dynamic Workers, codemode, runtime npm resolution
The execution ladder: workspace, isolate, npm, browser, sandbox
Self-authored extensions: agents that write their own tools at runtime

Each of these is usable directly with the Agent base class. Build exactly what you need with the primitives, or use the Think base class to get started fast. Let's look at what each one does.

Long-running agents

Agents, as they exist today, are ephemeral. They run for a session, tied to a single process or device, and then they are gone. A coding agent that dies when your laptop sleeps, that’s a tool. An agent that persists — that can wake up on demand, continue work after interruptions, and carry forward the state without depending on your local runtime — that starts to look like infrastructure. And it changes the scaling model for agents completely.

The Agents SDK builds on Durable Objects to give every agent an identity, persistent state, and the ability to wake on message. This is the actor model: each agent is an addressable entity with its own SQLite database. It consumes zero compute when hibernated. When something happens (an HTTP request, a WebSocket message, a scheduled alarm, an inbound email) the platform wakes the agent, loads its state, and hands it the event. The agent does its work, then goes back to sleep.

This changes the economics of running agents at scale. Instead of "one expensive agent per power user," you can build "one agent per customer" or "one agent per task" or "one agent per email thread." The marginal cost of spawning a new agent is effectively zero.

Surviving crashes: durable execution with fibers

An LLM call takes 30 seconds. A multi-turn agent loop can run for much longer. At any point during that window, the execution environment can vanish: a deploy, a platform restart, hitting resource limits. The upstream connection to the model provider is severed permanently, in-memory state is lost, and connected clients see the stream stop with no explanation.

runFiber() solves this. A fiber is a durable function invocation: registered in SQLite before execution begins, checkpointable at any point via stash(), and recoverable on restart via onFiberRecovered.

The SDK keeps the agent alive automatically during fiber execution, no special configuration needed. For work measured in minutes, keepAlive() / keepAliveWhile() prevents eviction during active work. For longer operations (CI pipelines, design reviews, video generation) the agent starts the work, persists the job ID, hibernates, and wakes on callback.

Delegating work: sub-agents via Facets

A single agent shouldn't do everything itself. Sub-agents are child Durable Objects colocated with the parent via Facets, each with their own isolated SQLite and execution context:

Sub-agents are isolated at the storage level. Each one gets its own SQLite database, and there’s no implicit sharing of data between them. This is enforced by the runtime where sub-agent RPC latency is a function call. TypeScript catches misuse at compile time.

Conversations that persist: the Session API

Agents that run for days or weeks need more than the typical flat list of messages. The experimental Session API models this explicitly. Available on the Agent base class, conversations are stored as trees, where each message has a parent_id. This enables forking (explore an alternative without losing the original path), non-destructive compaction (summarize older messages rather than deleting them), and full-text search across conversation history via FTS5.

Session is usable directly with Agent, and it's the storage layer that the Think base class builds on.

From tool calls to code execution

Conventional tool-calling has an awkward shape. The model calls a tool, pulls the result back through the context window, calls another tool, pulls that back, and so on. As the tool surface grows, this gets both expensive and clumsy. A hundred files means a hundred round-trips through the model.

But models are better at writing code to use a system than they are at playing the tool-calling game. This is the insight behind @cloudflare/codemode: instead of sequential tool calls, the LLM writes a single program that handles the entire task.

Instead of 100 round-trips to the model, you just run a single program. This leads to fewer tokens used, faster execution, and better results. The Cloudflare API MCP server demonstrates this at scale. We expose only two tools (search() and execute()), which consume ~1,000 tokens, vs. ~1.17 million tokens for the naive tool-per-endpoint equivalent. This is a 99.9% reduction.

The missing primitive: safe sandboxes

Once you accept that models should write code on behalf of users, the question becomes: where does that code run? Not eventually, not after a product team turns it into a roadmap item. Right now, for this user, against this system, with tightly defined permissions.

Dynamic Workers are that sandbox. A fresh V8 isolate spun up at runtime, in milliseconds, with a few megabytes of memory. That's roughly 100x faster and up to 100x more memory-efficient than a container. You can start a new one for every single request, run a snippet of code, and throw it away.

The critical design choice is the capability model. Instead of starting with a general-purpose machine and trying to constrain it, Dynamic Workers begin with almost no ambient authority (globalOutbound: null, no network access) and the developer grants capabilities explicitly, resource by resource, through bindings. We go from asking "how do we stop this thing from doing too much?" to "what exactly do we want this thing to be able to do?"

This is the right question for agent infrastructure.

The execution ladder

This capability model leads naturally to a spectrum of compute environments, an execution ladder that the agent escalates through as needed:

Tier 0 is the Workspace, a durable virtual filesystem backed by SQLite and R2. Read, write, edit, search, grep, diff. Powered by @cloudflare/shell.

Tier 1 is a Dynamic Worker: LLM-generated JavaScript running in a sandboxed isolate with no network access. Powered by @cloudflare/codemode.

Tier 2 adds npm. @cloudflare/worker-bundler fetches packages from the registry, bundles them with esbuild, and loads the result into the Dynamic Worker. The agent writes import { z } from "zod" and it just works.

Tier 3 is a headless browser via Cloudflare Browser Run. Navigate, click, extract, screenshot. Useful when the service doesn't support agents yet via MCP or APIs.

Tier 4 is a Cloudflare Sandbox configured with your toolchains, repos, and dependencies: git clone, npm test, cargo build, synced bidirectionally with the Workspace.

The key design principle: the agent should be useful at Tier 0 alone, where each tier is additive. The user can add capabilities as they go.

Building blocks, not a framework

All of these primitives are available as standalone packages. Dynamic Workers, @cloudflare/codemode, @cloudflare/worker-bundler, and @cloudflare/shell (a durable filesystem with tools) are all usable directly with the Agent base class. You can combine them to give any agent a workspace, code execution, and runtime package resolution without adopting an opinionated framework.

The platform

Here's the complete stack for building agents on Cloudflare:

Each of these is a building block. Together, they form something new: a platform where anyone can build, deploy, and run AI agents as capable as the ones running on your local machine today, but serverless, durable, and safe by construction.

The Think base class

Now that you've seen the primitives, here's what happens when you wire them all together.

Think is an opinionated harness that handles the full chat lifecycle: agentic loop, message persistence, streaming, tool execution, stream resumption, and extensions. You focus on what makes your agent unique.

The minimal subclass looks like this:

That’s effectively all you need to have a working chat agent with streaming, persistence, abort/cancel, error handling, resumable streams, and a built-in workspace filesystem. Deploy with npx wrangler deploy.

Think makes decisions for you. When you need more control, you can override the ones you care about:

Under the hood, Think runs the complete agentic loop on every turn: it assembles the context (base instructions + tool descriptions + skills + memory + conversation history), calls streamText, executes tool calls (with output truncation to prevent context blowup), appends results, loops until the model is done or the step limit is reached. All messages are persisted after each turn.

Lifecycle hooks

Think gives you hooks at every stage of the chat turn, without requiring you to own the whole pipeline:

Switch to a lower cost model for follow-up turns, limit the tools it can use, and pass in client-side context on each turn. Also log every tool call to analytics and automatically trigger one more follow-up turn after the model completes, all without replacing onChatMessage.

Persistent memory and long conversations

Think builds on Session API as its storage layer, giving you tree-structured messages with branching built in.

On top of that, it adds persistent memory through context blocks. These are structured sections of the system prompt that the model can read and update over time, and they persist across hibernation.The model sees "MEMORY (Important facts, use set_context to update) [42%, 462/1100 tokens]" and can proactively remember things.

Sessions are flexible. You can run multiple conversations per agent and fork them to try a different direction without losing the original.

As context grows, Think handles limits with non-destructive compaction. Older messages are summarized instead of removed, while the full history remains stored in SQLite.

Search is built in as well. Using FTS5, you can query conversation history within a session or across all the sessions. The agent is also able to search its own past using search_context tool.

The full execution ladder, wired in

Think integrates the entire execution ladder into a single getTools() return:

Self-authored extensions

Think takes code execution one step further. An agent can write its own extensions: TypeScript programs that run in Dynamic Workers, declaring permissions for network access and workspace operations.

Think's ExtensionManager bundles the extension (optionally with npm deps via @cloudflare/worker-bundler), loads it into a Dynamic Worker, and registers the new tools. The extension persists in DO storage and survives hibernation. The next time the user asks about pull requests, the agent has a github_create_pr tool that didn't exist 30 seconds ago.

This is the kind of self-improvement loop that makes agents genuinely more useful over time. Not through fine-tuning or RLHF, but through code. The agent is able to write new capabilities for itself, all in sandboxed, auditable, and revocable TypeScript.

Sub-agent RPC

Think also works as a sub-agent, called via chat() over RPC from a parent, with streaming events via callback:

Each child gets its own conversation tree, memory, tools, and model. The parent doesn't need to know the details.

Getting started

Project Think is experimental. The API surface is stable but will continue to evolve in the coming days and weeks. We're already using it internally to build our own background agent infrastructure, and we're sharing it early so you can build alongside us.

Think speaks the same WebSocket protocol as @cloudflare/ai-chat, so existing UI components work out of the box. If you've built on AIChatAgent, your client code doesn't change.

The third wave

We see three waves of AI agents:

The first wave was chatbots. They were stateless, reactive, and fragile. Every conversation started from scratch with no memory, no tools, and no ability to act. This made them useful for answering questions, but limited them to only answering questions.

The second wave was coding agents. These are stateful, tool-using and far more capable tools like Pi, Claude Code, OpenClaw, and Codex. These agents can read codebases, write code, execute it, and iterate. These proved that an LLM with the right tools is a general-purpose machine, but they run on your laptop, for one user, with no durability guarantees.

Now we are entering the third wave: agents as infrastructure. Durable, distributed, structurally safe, and serverless. These are agents that run on the Internet, survive failures, cost nothing when idle, and enforce security through architecture rather than behavior. Agents that any developer can build and deploy for any number of users.

This is the direction we’re betting on.

The Agents SDK is already powering thousands of production agents. With Project Think and the the primitives it introduces, we're adding the missing pieces to make those agents dramatically more capable: persistent workspaces, sandboxed code execution, durable long-running tasks, structural security, sub-agent coordination, and self-authored extensions.

It's available today in preview. We're building alongside you, and we'd genuinely love to see what you (and your coding agent) create with it.

^{Think is part of the Cloudflare Agents SDK, available as @cloudflare/think. The features described in this post are in preview. APIs may change as we incorporate feedback. Check the documentation and example to get started.}

Add voice to your agent

Sunil Pai — Wed, 15 Apr 2026 13:00:00 GMT

For many of us, our first experiences with AI agents have been through typing into a chat box. And for those of us using agents day to day, we have likely gotten very good at writing detailed prompts or markdown files to guide them.

But some of the moments where agents would be most useful are not always text-first. You might be on a long commute, juggling a few open sessions, or just wanting to speak naturally to an agent, have it speak back, and continue the interaction.

Adding voice to an agent should not require moving that agent into a separate voice framework. Today, we are releasing an experimental voice pipeline for the Agents SDK.

With @cloudflare/voice, you can add real-time voice to the same Agent architecture you already use. Voice just becomes another way you can talk to the same Durable Object, with the same tools, persistence, and WebSocket connection model that the Agents SDK already provides.

@cloudflare/voice is an experimental package for the Agents SDK that provides:

withVoice(Agent) for full conversation voice agents
withVoiceInput(Agent) for speech-to-text-only use cases, like dictation or voice search
useVoiceAgent and useVoiceInput hooks for React apps
VoiceClient for framework-agnostic clients
Built-in Workers AI providers, so that you can get started without external API keys:
- Continuous STT with Deepgram Flux
- Continuous STT with Deepgram Nova 3
- Text-to-speech with Deepgram Aura

This means you can now build an agent that users can talk to in real time over a single WebSocket connection, while keeping the same Agent class, Durable Object instance, and the same SQLite-backed conversation history.

Just as importantly, we want this to be bigger than one fixed default stack. The provider interfaces in @cloudflare/voice are intentionally small, and we want speech, telephony, and transport providers to build with us, so developers can mix and match the right components for their use case, instead of being locked into a single voice architecture.

Get started with voice

Here’s the minimal server-side pattern for a voice agent in the Agents SDK:

That’s the whole server. You add a continuous transcriber, a text-to-speech provider, and implement onTurn().

On the client side, you can connect to it with a React hook:

If you are not using React, you can use VoiceClient directly from @cloudflare/voice/client.

How the voice pipeline works

With the Agents SDK, every agent is a Durable Object — a stateful, addressable server instance with its own SQLite database, WebSocket connections, and application logic. The voice pipeline extends this model instead of replacing it.

At a high level, the flow looks like this:

Here’s how the pipeline breaks down, step by step:

Audio transport: The browser captures microphone audio and streams 16 kHz mono PCM over the same WebSocket connection the agent already uses.
STT session setup: When the call starts, the agent creates a continuous transcriber session that lives for the duration of the call.
STT input: Audio streams continuously into that session.
STT turn detection: The speech-to-text model itself decides when the user has finished an utterance and emits a stable transcript for that turn.
LLM/application logic: The voice pipeline passes that transcript to your onTurn() method.
TTS output: Your response is synthesized to audio and sent back to the client. If onTurn() returns a stream, the pipeline sentence-chunks it and starts sending audio as sentences are ready.
Persistence: The user and agent messages are persisted in SQLite, so conversation history survives reconnections and deployments.

Why voice should grow with the rest of your agent

Many voice frameworks focus on the voice loop itself: audio in, transcription, model response, audio out. Those are important primitives, but there’s a lot more to an agent than just voice.

Real agents running in production will grow. They need state, scheduling, persistence, tools, workflows, telephony, and ways to keep all of that consistent across channels. As your agent grows in complexity, voice stops being a standalone feature and becomes part of a larger system.

We wanted voice in the Agents SDK to start from that assumption. Instead of building voice as a separate stack, we built it on top of the same Durable Object-based agent platform, so you can pull in the rest of the primitives you need without re-architecting the application later.

Voice and text share the same state

A user might start by typing, switch to voice, and go back to text. With Agents SDK, these are all just different inputs to the same agent. The same conversation history lives in SQLite, and the same tools are available. This gives you both a cleaner mental model and a much simpler application architecture to reason about.

Lower latency comes from...

a shorter network path

Voice experiences feel good or bad very quickly. Once a user stops speaking, the system needs to transcribe, think, and start speaking back fast enough to feel conversational.

A lot of voice latency is not pure model time. It’s the cost of bouncing audio and text between different services in different places. Audio needs to go to STT, transcripts go to an LLM, and responses go to a TTS model – and each handoff adds network overhead.

With the Agents SDK voice pipeline, the agent runs on Cloudflare’s network, and the built-in providers use Workers AI bindings. That keeps the pipeline tighter and reduces the amount of infrastructure you have to stitch together yourself.

built-in streaming

A voice agent interaction feels much more natural if it speaks the first sentence quickly (also called Time-to-First Audio). When onTurn() returns a stream, the pipeline chunks it into sentences and starts synthesis as sentences complete. That means the user can hear the beginning of the answer while the rest is still being generated.

A more realistic backend

Here is a fuller example that streams an LLM response and starts speaking it back, sentence by sentence:

Context.messages gives you recent SQLite-backed conversation history, and context.signal lets the pipeline abort the LLM call if the user interrupts.

Voice as an input: `withVoiceInput`

Not every speech interface needs to speak back. Sometimes you might want dictation, transcription, or voice search. For these use cases, you can use withVoiceInput

On the client, useVoiceInput gives you a lightweight interface centered on transcriptions:

This is useful when speech is an input method, and you don’t need a full conversational loop.

Voice and text on the same connection

The same client can call sendText(“What’s the weather?”), which bypasses STT and sends the text directly to onTurn(). During an active call, the response can be spoken and shown as text. Outside a call, it can remain text-only.

This gives you a genuinely multimodal agent, without splitting the implementation into different code paths.

What else can you build?

Because a voice agent is still an agent, all the normal Agents SDK capabilities still apply.

Tools and scheduling

You can greet a caller when a session starts:

You can schedule spoken reminders and expose tools to your LLM just like any other agent:

Runtime model switching

The voice pipeline also lets you choose a transcription model dynamically per connection.

For example, you might prefer Flux for conversational turn-taking and Nova 3 for higher-accuracy dictation. You can switch at runtime by overriding createTranscriber():

On the client, you can pass query parameters through the hook:

Pipeline hooks

You can also intercept data between stages:

afterTranscribe(transcript, connection)
beforeSynthesize(text, connection)
afterSynthesize(audio, text, connection)

These hooks are useful for content filtering, text normalization, language-specific transformations, or custom logging.

Telephone and transport options

By default, the voice pipeline uses a single WebSocket connection as the simplest path for 1:1 voice agents. But that’s not the only option.

Phone calls via Twilio

You can connect phone calls to the same agent using the Twilio adapter:

This lets the same agent handle web voice, text input, and phone calls.

One caveat: the default Workers AI TTS provider returns MP3, while Twilio expects mulaw 8kHz audio. For production telephony, you may want to use a TTS provider that outputs PCM or mulaw directly.

WebRTC

If you need a transport that is better suited to difficult network conditions or will include multiple participants, the voice package also includes SFU utilities and supports custom transports. The default model is WebSocket-native today, but we plan to develop more adapters to connect to our global SFU infrastructure.

Build with us

The voice pipeline is provider-agnostic by design.

Under the hood, each stage is defined by a small interface: a transcriber opens a continuous session and accepts audio frames as they arrive, while a TTS provider takes text and returns audio. If a provider can stream audio output, the pipeline can use that too.

We didn’t want voice support in Agents SDK to only work with one fixed combination of models and transports. We wanted the default path to be simple, while still making it easy to plug in other providers as the ecosystem grows.

The built-in providers use Workers AI, so you can get started without external API keys:

WorkersAIFluxSTT for conversational streaming STT
WorkersAINova3STT for dictation-style streaming STT
WorkersAITTS for text-to-speech

But the bigger goal is interoperability. If you maintain a speech or voice service, these interfaces are small enough to implement without needing to understand the rest of the SDK internals. If your STT provider accepts streaming audio and can detect utterance boundaries, it can satisfy the transcriber interface. If your TTS provider can stream audio output, even better.

We would love to work on interoperability with:

STT providers like AssemblyAI, Rev.ai, Speechmatics, or any service with a real-time transcription API
TTS providers like PlayHT, LMNT, Cartesia, Coqui, Amazon Polly, or Google Cloud TTS
telephony adapters for platforms like Vonage, Telnyx, or Bandwidth
transport implementations for WebRTC data channels, SFU bridges, and other audio transport layers

We are also interested in collaborations that go beyond individual providers:

latency benchmarking across STT + LLM + TTS combinations
multilingual support and better documentation for non-English voice agents
accessibility work, especially around multimodal interfaces and speech impairments

If you are building voice infrastructure and want to see a first-class integration, open a PR or reach out.

Try it now

The voice pipeline is available today as an experimental package:

Add @cloudflare/voice, give your agent a transcriber and a TTS provider, deploy it, and start talking to it. You can also read the API reference.

If you build something interesting, open an issue or PR on github.com/cloudflare/agents. Voice should not require a separate stack, and we think the best voice agents will be the ones built on the same durable application model as everything else.

Sandboxing AI agents, 100x faster

Kenton Varda — Tue, 24 Mar 2026 13:00:00 GMT

Last September we introduced Code Mode, the idea that agents should perform tasks not by making tool calls, but instead by writing code that calls APIs. We've shown that simply converting an MCP server into a TypeScript API can cut token usage by 81%. We demonstrated that Code Mode can also operate behind an MCP server instead of in front of it, creating the new Cloudflare MCP server that exposes the entire Cloudflare API with just two tools and under 1,000 tokens.

But if an agent (or an MCP server) is going to execute code generated on-the-fly by AI to perform tasks, that code needs to run somewhere, and that somewhere needs to be secure. You can't just eval() AI-generated code directly in your app: a malicious user could trivially prompt the AI to inject vulnerabilities.

You need a sandbox: a place to execute code that is isolated from your application and from the rest of the world, except for the specific capabilities the code is meant to access.

Sandboxing is a hot topic in the AI industry. For this task, most people are reaching for containers. Using a Linux-based container, you can start up any sort of code execution environment you want. Cloudflare even offers our container runtime and our Sandbox SDK for this purpose.

But containers are expensive and slow to start, taking hundreds of milliseconds to boot and hundreds of megabytes of memory to run. You probably need to keep them warm to avoid delays, and you may be tempted to reuse existing containers for multiple tasks, compromising the security.

If we want to support consumer-scale agents, where every end user has an agent (or many!) and every agent writes code, containers are not enough. We need something lighter.

And we have it.

Dynamic Worker Loader: a lean sandbox

Tucked into our Code Mode post in September was the announcement of a new, experimental feature: the Dynamic Worker Loader API. This API allows a Cloudflare Worker to instantiate a new Worker, in its own sandbox, with code specified at runtime, all on the fly.

Dynamic Worker Loader is now in open beta, available to all paid Workers users.

Read the docs for full details, but here's what it looks like:

That's it.

100x faster

Dynamic Workers use the same underlying sandboxing mechanism that the entire Cloudflare Workers platform has been built on since its launch, eight years ago: isolates. An isolate is an instance of the V8 JavaScript execution engine, the same engine used by Google Chrome. They are how Workers work.

An isolate takes a few milliseconds to start and uses a few megabytes of memory. That's around 100x faster and 10x-100x more memory efficient than a typical container.

That means that if you want to start a new isolate for every user request, on-demand, to run one snippet of code, then throw it away, you can.

Unlimited scalability

Many container-based sandbox providers impose limits on global concurrent sandboxes and rate of sandbox creation. Dynamic Worker Loader has no such limits. It doesn't need to, because it is simply an API to the same technology that has powered our platform all along, which has always allowed Workers to seamlessly scale to millions of requests per second.

Want to handle a million requests per second, where every single request loads a separate Dynamic Worker sandbox, all running concurrently? No problem!

Zero latency

One-off Dynamic Workers usually run on the same machine — the same thread, even — as the Worker that created them. No need to communicate around the world to find a warm sandbox. Isolates are so lightweight that we can just run them wherever the request landed. Dynamic Workers are supported in every one of Cloudflare's hundreds of locations around the world.

It's all JavaScript

The only catch, vs. containers, is that your agent needs to write JavaScript.

Technically, Workers (including dynamic ones) can use Python and WebAssembly, but for small snippets of code — like that written on-demand by an agent — JavaScript will load and run much faster.

We humans tend to have strong preferences on programming languages, and while many love JavaScript, others might prefer Python, Rust, or countless others.

But we aren't talking about humans here. We're talking about AI. AI will write any language you want it to. LLMs are experts in every major language. Their training data in JavaScript is immense.

JavaScript, by its nature on the web, is designed to be sandboxed. It is the correct language for the job.

Tools defined in TypeScript

If we want our agent to be able to do anything useful, it needs to talk to external APIs. How do we tell it about the APIs it has access to?

MCP defines schemas for flat tool calls, but not programming APIs. OpenAPI offers a way to express REST APIs, but it is verbose, both in the schema itself and the code you'd have to write to call it.

For APIs exposed to JavaScript, there is a single, obvious answer: TypeScript.

Agents know TypeScript. TypeScript is designed to be concise. With very few tokens, you can give your agent a precise understanding of your API.

Compare this with the equivalent OpenAPI spec (which is so long you have to scroll to see it all):

We think the TypeScript API is better. It's fewer tokens and much easier to understand (for both agents and humans).

Dynamic Worker Loader makes it easy to implement a TypeScript API like this in your own Worker and then pass it in to the Dynamic Worker either as a method parameter or in the env object. The Workers Runtime will automatically set up a Cap'n Web RPC bridge between the sandbox and your harness code, so that the agent can invoke your API across the security boundary without ever realizing that it isn't using a local library.

That means your agent can write code like this:

HTTP filtering and credential injection

If you prefer to give your agents HTTP APIs, that's fully supported. Using the globalOutbound option to the worker loader API, you can register a callback to be invoked on every HTTP request, in which you can inspect the request, rewrite it, inject auth keys, respond to it directly, block it, or anything else you might like.

For example, you can use this to implement credential injection (token injection): When the agent makes an HTTP request to a service that requires authorization, you add credentials to the request on the way out. This way, the agent itself never knows the secret credentials, and therefore cannot leak them.

Using a plain HTTP interface may be desirable when an agent is talking to a well-known API that is in its training set, or when you want your agent to use a library that is built on a REST API (the library can run inside the agent's sandbox).

With that said, in the absence of a compatibility requirement, TypeScript RPC interfaces are better than HTTP:

As shown above, a TypeScript interface requires far fewer tokens to describe than an HTTP interface.
The agent can write code to call TypeScript interfaces using far fewer tokens than equivalent HTTP.
With TypeScript interfaces, since you are defining your own wrapper interface anyway, it is easier to narrow the interface to expose exactly the capabilities that you want to provide to your agent, both for simplicity and security. With HTTP, you are more likely implementing filtering of requests made against some existing API. This is hard, because your proxy must fully interpret the meaning of every API call in order to properly decide whether to allow it, and HTTP requests are complicated, with many headers and other parameters that could all be meaningful. It ends up being easier to just write a TypeScript wrapper that only implements the functions you want to allow.

Battle-hardened security

Hardening an isolate-based sandbox is tricky, as it is a more complicated attack surface than hardware virtual machines. Although all sandboxing mechanisms have bugs, security bugs in V8 are more common than security bugs in typical hypervisors. When using isolates to sandbox possibly-malicious code, it's important to have additional layers of defense-in-depth. Google Chrome, for example, implemented strict process isolation for this reason, but it is not the only possible solution.

We have nearly a decade of experience securing our isolate-based platform. Our systems automatically deploy V8 security patches to production within hours — faster than Chrome itself. Our security architecture features a custom second-layer sandbox with dynamic cordoning of tenants based on risk assessments. We've extended the V8 sandbox itself to leverage hardware features like MPK. We've teamed up with (and hired) leading researchers to develop novel defenses against Spectre. We also have systems that scan code for malicious patterns and automatically block them or apply additional layers of sandboxing. And much more.

When you use Dynamic Workers on Cloudflare, you get all of this automatically.

Helper libraries

We've built a number of libraries that you might find useful when working with Dynamic Workers:

Code Mode

@cloudflare/codemode simplifies running model-generated code against AI tools using Dynamic Workers. At its core is DynamicWorkerExecutor(), which constructs a purpose-built sandbox with code normalisation to handle common formatting errors, and direct access to a globalOutbound fetcher for controlling fetch() behaviour inside the sandbox — set it to null for full isolation, or pass a Fetcher binding to route, intercept or enrich outbound requests from the sandbox.

The Code Mode SDK also provides two server-side utility functions. codeMcpServer({ server, executor }) wraps an existing MCP Server, replacing its tool surface with a single code() tool. openApiMcpServer({ spec, executor, request }) goes further: given an OpenAPI spec and an executor, it builds a complete MCP Server with search() and execute() tools as used by the Cloudflare MCP Server, and better suited to larger APIs.

In both cases, the code generated by the model runs inside Dynamic Workers, with calls to external services made over RPC bindings passed to the executor.

Learn more about the library and how to use it.

Bundling

Dynamic Workers expect pre-bundled modules. @cloudflare/worker-bundler handles that for you: give it source files and a package.json, and it resolves npm dependencies from the registry, bundles everything with esbuild, and returns the module map the Worker Loader expects.

It also supports full-stack apps via createApp — bundle a server Worker, client-side JavaScript, and static assets together, with built-in asset serving that handles content types, ETags, and SPA routing.

Learn more about the library and how to use it.

File manipulation

@cloudflare/shell gives your agent a virtual filesystem inside a Dynamic Worker. Agent code calls typed methods on a state object — read, write, search, replace, diff, glob, JSON query/update, archive — with structured inputs and outputs instead of string parsing.

Storage is backed by a durable Workspace (SQLite + R2), so files persist across executions. Coarse operations like searchFiles, replaceInFiles, and planEdits minimize RPC round-trips — the agent issues one call instead of looping over individual files. Batch writes are transactional by default: if any write fails, earlier writes roll back automatically.

The package also ships prebuilt TypeScript type declarations and a system prompt template, so you can drop the full state API into your LLM context in a handful of tokens.

Learn more about the library and how to use it.

How are people using it?

Code Mode

Developers want their agents to write and execute code against tool APIs, rather than making sequential tool calls one at a time. With Dynamic Workers, the LLM generates a single TypeScript function that chains multiple API calls together, runs it in a Dynamic Worker, and returns the final result back to the agent. As a result, only the output, and not every intermediate step, ends up in the context window. This cuts both latency and token usage, and produces better results, especially when the tool surface is large.

Our own Cloudflare MCP server is built exactly this way: it exposes the entire Cloudflare API through just two tools — search and execute — in under 1,000 tokens, because the agent writes code against a typed API instead of navigating hundreds of individual tool definitions.

Building custom automations

Developers are using Dynamic Workers to let agents build custom automations on the fly. Zite, for example, is building an app platform where users interact through a chat interface — the LLM writes TypeScript behind the scenes to build CRUD apps, connect to services like Stripe, Airtable, and Google Calendar, and run backend logic, all without the user ever seeing a line of code. Every automation runs in its own Dynamic Worker, with access to only the specific services and libraries that the endpoint needs.

“To enable server-side code for Zite’s LLM-generated apps, we needed an execution layer that was instant, isolated, and secure. Cloudflare’s Dynamic Workers hit the mark on all three, and out-performed all of the other platforms we benchmarked for speed and library support. The NodeJS compatible runtime supported all of Zite’s workflows, allowing hundreds of third party integrations, without sacrificing on startup time. Zite now services millions of execution requests daily thanks to Dynamic Workers.”

— Antony Toron, CTO and Co-Founder, Zite

Running AI-generated applications

Developers are building platforms that generate full applications from AI — either for their customers or for internal teams building prototypes. With Dynamic Workers, each app can be spun up on demand, then put back into cold storage until it's invoked again. Fast startup times make it easy to preview changes during active development. Platforms can also block or intercept any network requests the generated code makes, keeping AI-generated apps safe to run.

Pricing

Dynamically-loaded Workers are priced at $0.002 per unique Worker loaded per day (as of this post’s publication), in addition to the usual CPU time and invocation pricing of regular Workers.

For AI-generated "code mode" use cases, where every Worker is a unique one-off, this means the price is $0.002 per Worker loaded (plus CPU and invocations). This cost is typically negligible compared to the inference costs to generate the code.

During the beta period, the $0.002 charge is waived. As pricing is subject to change, please always check our Dynamic Workers pricing for the most current information.

Get Started

If you’re on the Workers Paid plan, you can start using Dynamic Workers today.

Dynamic Workers Starter

Use this “hello world” starter to get a Worker deployed that can load and execute Dynamic Workers.

Dynamic Workers Playground

You can also deploy the Dynamic Workers Playground, where you’ll be able to write or import code, bundle it at runtime with @cloudflare/worker-bundler, execute it through a Dynamic Worker, see real-time responses and execution logs.

Dynamic Workers are fast, scalable, and lightweight. Find us on Discord if you have any questions. We’d love to see what you build!

Code Mode: the better way to use MCP

Kenton Varda — Fri, 26 Sep 2025 13:00:00 GMT

It turns out we've all been using MCP wrong.

Most agents today use MCP by directly exposing the "tools" to the LLM.

We tried something different: Convert the MCP tools into a TypeScript API, and then ask an LLM to write code that calls that API.

The results are striking:

We found agents are able to handle many more tools, and more complex tools, when those tools are presented as a TypeScript API rather than directly. Perhaps this is because LLMs have an enormous amount of real-world TypeScript in their training set, but only a small set of contrived examples of tool calls.
The approach really shines when an agent needs to string together multiple calls. With the traditional approach, the output of each tool call must feed into the LLM's neural network, just to be copied over to the inputs of the next call, wasting time, energy, and tokens. When the LLM can write code, it can skip all that, and only read back the final results it needs.

In short, LLMs are better at writing code to call MCP, than at calling MCP directly.

What's MCP?

For those that aren't familiar: Model Context Protocol is a standard protocol for giving AI agents access to external tools, so that they can directly perform work, rather than just chat with you.

Seen another way, MCP is a uniform way to:

expose an API for doing something,
along with documentation needed for an LLM to understand it,
with authorization handled out-of-band.

MCP has been making waves throughout 2025 as it has suddenly greatly expanded the capabilities of AI agents.

The "API" exposed by an MCP server is expressed as a set of "tools". Each tool is essentially a remote procedure call (RPC) function – it is called with some parameters and returns a response. Most modern LLMs have the capability to use "tools" (sometimes called "function calling"), meaning they are trained to output text in a certain format when they want to invoke a tool. The program invoking the LLM sees this format and invokes the tool as specified, then feeds the results back into the LLM as input.

Anatomy of a tool call

Under the hood, an LLM generates a stream of "tokens" representing its output. A token might represent a word, a syllable, some sort of punctuation, or some other component of text.

A tool call, though, involves a token that does not have any textual equivalent. The LLM is trained (or, more often, fine-tuned) to understand a special token that it can output that means "the following should be interpreted as a tool call," and another special token that means "this is the end of the tool call." Between these two tokens, the LLM will typically write tokens corresponding to some sort of JSON message that describes the call.

For instance, imagine you have connected an agent to an MCP server that provides weather info, and you then ask the agent what the weather is like in Austin, TX. Under the hood, the LLM might generate output like the following. Note that here we've used words in <| and |> to represent our special tokens, but in fact, these tokens do not represent text at all; this is just for illustration.

I will use the Weather MCP server to find out the weather in Austin, TX.

Upon seeing these special tokens in the output, the LLM's harness will interpret the sequence as a tool call. After seeing the end token, the harness pauses execution of the LLM. It parses the JSON message and returns it as a separate component of the structured API result. The agent calling the LLM API sees the tool call, invokes the relevant MCP server, and then sends the results back to the LLM API. The LLM's harness will then use another set of special tokens to feed the result back into the LLM:

The LLM reads these tokens in exactly the same way it would read input from the user – except that the user cannot produce these special tokens, so the LLM knows it is the result of the tool call. The LLM then continues generating output like normal.

Different LLMs may use different formats for tool calling, but this is the basic idea.

What's wrong with this?

The special tokens used in tool calls are things LLMs have never seen in the wild. They must be specially trained to use tools, based on synthetic training data. They aren't always that good at it. If you present an LLM with too many tools, or overly complex tools, it may struggle to choose the right one or to use it correctly. As a result, MCP server designers are encouraged to present greatly simplified APIs as compared to the more traditional API they might expose to developers.

Meanwhile, LLMs are getting really good at writing code. In fact, LLMs asked to write code against the full, complex APIs normally exposed to developers don't seem to have too much trouble with it. Why, then, do MCP interfaces have to "dumb it down"? Writing code and calling tools are almost the same thing, but it seems like LLMs can do one much better than the other?

The answer is simple: LLMs have seen a lot of code. They have not seen a lot of "tool calls". In fact, the tool calls they have seen are probably limited to a contrived training set constructed by the LLM's own developers, in order to try to train it. Whereas they have seen real-world code from millions of open source projects.

Making an LLM perform tasks with tool calling is like putting Shakespeare through a month-long class in Mandarin and then asking him to write a play in it. It's just not going to be his best work.

But MCP is still useful, because it is uniform

MCP is designed for tool-calling, but it doesn't actually have to be used that way.

The "tools" that an MCP server exposes are really just an RPC interface with attached documentation. We don't really have to present them as tools. We can take the tools, and turn them into a programming language API instead.

But why would we do that, when the programming language APIs already exist independently? Almost every MCP server is just a wrapper around an existing traditional API – why not expose those APIs?

Well, it turns out MCP does something else that's really useful: It provides a uniform way to connect to and learn about an API.

An AI agent can use an MCP server even if the agent's developers never heard of the particular MCP server, and the MCP server's developers never heard of the particular agent. This has rarely been true of traditional APIs in the past. Usually, the client developer always knows exactly what API they are coding for. As a result, every API is able to do things like basic connectivity, authorization, and documentation a little bit differently.

This uniformity is useful even when the AI agent is writing code. We'd like the AI agent to run in a sandbox such that it can only access the tools we give it. MCP makes it possible for the agentic framework to implement this, by handling connectivity and authorization in a standard way, independent of the AI code. We also don't want the AI to have to search the Internet for documentation; MCP provides it directly in the protocol.

OK, how does it work?

We have already extended the Cloudflare Agents SDK to support this new model!

For example, say you have an app built with ai-sdk that looks like this:

You can wrap the tools and prompt with the codemode helper, and use them in your app:

With this change, your app will now start generating and running code that itself will make calls to the tools you defined, MCP servers included. We will introduce variants for other libraries in the very near future. Read the docs for more details and examples.

Converting MCP to TypeScript

When you connect to an MCP server in "code mode", the Agents SDK will fetch the MCP server's schema, and then convert it into a TypeScript API, complete with doc comments based on the schema.

For example, connecting to the MCP server at https://gitmcp.io/cloudflare/agents, will generate a TypeScript definition like this:

This TypeScript is then loaded into the agent's context. Currently, the entire API is loaded, but future improvements could allow an agent to search and browse the API more dynamically – much like an agentic coding assistant would.

Running code in a sandbox

Instead of being presented with all the tools of all the connected MCP servers, our agent is presented with just one tool, which simply executes some TypeScript code.

The code is then executed in a secure sandbox. The sandbox is totally isolated from the Internet. Its only access to the outside world is through the TypeScript APIs representing its connected MCP servers.

These APIs are backed by RPC invocation which calls back to the agent loop. There, the Agents SDK dispatches the call to the appropriate MCP server.

The sandboxed code returns results to the agent in the obvious way: by invoking console.log(). When the script finishes, all the output logs are passed back to the agent.

Dynamic Worker loading: no containers here

This new approach requires access to a secure sandbox where arbitrary code can run. So where do we find one? Do we have to run containers? Is that expensive?

No. There are no containers. We have something much better: isolates.

The Cloudflare Workers platform has always been based on V8 isolates, that is, isolated JavaScript runtimes powered by the V8 JavaScript engine.

Isolates are far more lightweight than containers. An isolate can start in a handful of milliseconds using only a few megabytes of memory.

Isolates are so fast that we can just create a new one for every piece of code the agent runs. There's no need to reuse them. There's no need to prewarm them. Just create it, on demand, run the code, and throw it away. It all happens so fast that the overhead is negligible; it's almost as if you were just eval()ing the code directly. But with security.

The Worker Loader API

Until now, though, there was no way for a Worker to directly load an isolate containing arbitrary code. All Worker code instead had to be uploaded via the Cloudflare API, which would then deploy it globally, so that it could run anywhere. That's not what we want for Agents! We want the code to just run right where the agent is.

To that end, we've added a new API to the Workers platform: the Worker Loader API. With it, you can load Worker code on-demand. Here's what it looks like:

You can start playing with this API right now when running workerd locally with Wrangler (check out the docs), and you can sign up for beta access to use it in production.

Workers are better sandboxes

The design of Workers makes it unusually good at sandboxing, especially for this use case, for a few reasons:

Faster, cheaper, disposable sandboxes

The Workers platform uses isolates instead of containers. Isolates are much lighter-weight and faster to start up. It takes mere milliseconds to start a fresh isolate, and it's so cheap we can just create a new one for every single code snippet the agent generates. There's no need to worry about pooling isolates for reuse, prewarming, etc.

We have not yet finalized pricing for the Worker Loader API, but because it is based on isolates, we will be able to offer it at a significantly lower cost than container-based solutions.

Isolated by default, but connected with bindings

Workers are just better at handling isolation.

In Code Mode, we prohibit the sandboxed worker from talking to the Internet. The global fetch() and connect() functions throw errors.

But on most platforms, this would be a problem. On most platforms, the way you get access to private resources is, you start with general network access. Then, using that network access, you send requests to specific services, passing them some sort of API key to authorize private access.

But Workers has always had a better answer. In Workers, the "environment" (env object) doesn't just contain strings, it contains live objects, also known as "bindings". These objects can provide direct access to private resources without involving generic network requests.

In Code Mode, we give the sandbox access to bindings representing the MCP servers it is connected to. Thus, the agent can specifically access those MCP servers without having network access in general.

Limiting access via bindings is much cleaner than doing it via, say, network-level filtering or HTTP proxies. Filtering is hard on both the LLM and the supervisor, because the boundaries are often unclear: the supervisor may have a hard time identifying exactly what traffic is legitimately necessary to talk to an API. Meanwhile, the LLM may have difficulty guessing what kinds of requests will be blocked. With the bindings approach, it's well-defined: the binding provides a JavaScript interface, and that interface is allowed to be used. It's just better this way.

No API keys to leak

An additional benefit of bindings is that they hide API keys. The binding itself provides an already-authorized client interface to the MCP server. All calls made on it go to the agent supervisor first, which holds the access tokens and adds them into requests sent on to MCP.

This means that the AI cannot possibly write code that leaks any keys, solving a common security problem seen in AI-authored code today.

Try it now!

Sign up for the production beta

The Dynamic Worker Loader API is in closed beta. To use it in production, sign up today.

Or try it locally

If you just want to play around, though, Dynamic Worker Loading is fully available today when developing locally with Wrangler and workerd – check out the docs for Dynamic Worker Loading and code mode in the Agents SDK to get started.

Building agents with OpenAI and Cloudflare’s Agents SDK

Kate Reznykova — Wed, 25 Jun 2025 14:00:00 GMT

What even is an Agents SDK?

The AI landscape is evolving at an incredible pace, and with it, the tools and platforms available to developers are becoming more powerful and interconnected than ever. Here at Cloudflare, we're genuinely passionate about empowering you to build the next generation of applications, and that absolutely includes intelligent agents that can reason, act, and interact with the world.

When we talk about "Agents SDKs", it can sometimes feel a bit… fuzzy. Some SDKs (software development kits) described as 'agent' SDKs are really about providing frameworks for tool calling and interacting with models. They're fantastic for defining an agent's "brain" – its intelligence, its ability to reason, and how it uses external tools. Here’s the thing: all these agents need a place to actually run. Then there's what we offer at Cloudflare: an SDK purpose-built to provide a seamless execution layer for agents. While orchestration frameworks define how agents think, our SDK focuses on where they run, abstracting away infrastructure to enable persistent, scalable execution across our global network.

Think of it as the ultimate shell, the place where any agent, defined by any agent SDK (like the powerful new OpenAI Agents SDK), can truly live, persist, and run at global scale.

We’ve chosen OpenAI’s Agents SDK for this example, but the infrastructure is not specific to it. The execution layer is designed to integrate with any agent runtime.

That’s what this post is about: what we built, what we learned, and the design patterns that emerged from fusing these two pieces together.

Why use two SDKs?

OpenAI’s Agents SDK gives you the agent: a reasoning loop, tool definitions, and memory abstraction. But it assumes you bring your own runtime and state.

Cloudflare’s Agents SDK gives you the environment: a persistent object on our network with identity, state, and built-in concurrency control. But it doesn’t tell you how your agent should behave.

By combining them, we get a clear split:

OpenAI: cognition, planning, tool orchestration
Cloudflare: location, identity, memory, execution

This separation of concerns let us stay focused on logic, not glue code.

What you can build with persistent agents

Cloudflare Durable Objects let agents go beyond simple, stateless functions. They can persist memory, coordinate across workflows, and respond in real time. Combined with the OpenAI Agents SDK, this enables systems that reason, remember, and adapt over time.

Here are three architectural patterns that show how agents can be composed, guided, and connected:

Multi-agent systems: Divide responsibilities across specialized agents that collaborate on tasks.

Human-in-the-loop: Let agents plan independently but wait for human input at key decision points.

Addressable agents: Make agents reachable through real-world interfaces like phone calls or WebSockets.

Multi-agent systems

Multi-agent systems let you break down a task into specialized agents that handle distinct responsibilities. In the example below, a triage agent routes questions to either a history or math tutor based on the query. Each agent has its own memory, logic, and instructions. With Cloudflare Durable Objects, these agents persist across sessions and can coordinate responses, making it easy to build systems that feel modular but work together intelligently.

Human-in-the-loop

We implemented a human-in-the-loop agent example using these two SDKs together. The goal: run an OpenAI agent with a planning loop, allow human decisions to intercept the plan, and preserve state across invocations via Durable Objects.

The architecture looked like this:

An OpenAI Agent instance runs inside a Durable Object
User submits a prompt
The agent plans multiple steps
After each step, it yields control and waits for a human to approve or intervene
State (including memory and intermediate steps) is persisted in this.state

It looks like this:

This design lets us intercept the agent’s plan at every step and store it. The client could then:

Fetch the pending step via another route
Review or modify it
Send approval or rejection back to the agent to resume execution

This is only possible because the agent lives inside a Durable Object. It has persistent memory and identity, allowing multi-turn interaction even across sessions

Addressable agents: “Call my Agent”

One of the most interesting takeaways from this pattern is that agents are not just HTTP endpoints. Yes, you can fetch() them via Durable Objects, but conceptually, agents are addressable entities — and there's no reason those addresses have to be tied to URLs.

You could imagine agents reachable by phone call, by email, or via pub/sub systems. Durable Objects give each agent a global identity that can be referenced however you want.

In this design:

External sources of input connect to the Cloudflare network; via email, HTTP, or any network interface. In this demo, we use Twilio to route a phone call to a WebSocket input on the Agent.
The call is routed through Cloudflare’s infrastructure, so latency is low and identity is preserved.
We also store the real-time state updates within the agent, so we can view it on a website (served by the agent itself). This is great for use cases like customer service and education.

This lets an agent become truly multimodal, accepting and outputting data as audio, video, text, email. This pattern opened up exciting possibilities for modular agents and long-running workflows where each agent focuses on a specific domain.

What we learned (and what you should know)

1. OpenAI assumes you bring your own state — Cloudflare gives you one

OpenAI’s SDK is stateless by default. You can attach memory abstractions, but the SDK doesn’t tell you where or how to persist it. Cloudflare’s Durable Objects, by contrast, are persistent — that’s the whole point. Every instance has a unique identity and storage API (this.ctx.storage). This means we can:

Store long-term memory across invocations
Hydrate the agent’s memory before run()
Save any updates after run() completes

2. Durable Object routing isn’t just routing — it’s your agent factory

At first glance, routeAgentRequest looks like a simple dispatcher: map a request to a Durable Object based on a URL. But it plays a deeper role — it defines the identity boundary for your agents. We realized this while trying to scope agent instances per user and per task.

In Durable Objects, identity is tied to an ID. When you call idFromName(), you get a stable, name-based ID that always maps to the same object. This means repeated calls with the same name return the same agent instance — along with its memory and state. In contrast, calling .newUniqueId() creates a new, isolated object each time.

This is where routing becomes critical: it's where you decide how long an agent should live, and what it should remember.

This lets us:

Spin up multiple agents per user (e.g. one per session or task)
Co-locate memory and logic
Avoid unintended memory sharing between conversations

Gotcha: If you forget to use idFromName() and just call .newUniqueId(), you’ll get a new agent each time, and your memory will never persist. This is a common early bug that silently kills statefulness.

3. Agents are composable — and that’s powerful

Agents can invoke each other using Durable Object routing, forming workflows where each agent owns its own memory and logic. This enables composition — building systems from specialized parts that cooperate.

This makes agent architecture more like microservices — composable, stateful, and distributed.

Final thoughts: building agents that think and live

This pattern — OpenAI cognition + Cloudflare execution — worked better than we expected. It let us:

Write agents with full planning and memory
Pause and resume them asynchronously
Avoid building orchestration from scratch
Compose multiple agents into larger systems

The hardest parts:

Correctly scoping agent architecture
Persisting only valid state
Debugging with good observability

At Cloudflare, we are incredibly excited to see what you build with this powerful combination. The future of AI agents is intelligent, distributed, and incredibly capable. Get started today by exploring the OpenAI Agents SDK and diving into the Cloudflare Agents SDK documentation (which leverages Cloudflare Workers and Durable Objects).

We’re just getting started, and we love to see all that you build. Please join our Discord, ask questions, and tell us what you’re building.

Connect any React application to an MCP server in three lines of code

Dina Kozlov — Wed, 18 Jun 2025 13:00:00 GMT

You can deploy a remote Model Context Protocol (MCP) server on Cloudflare in just one-click. Don’t believe us? Click the button below.

This will get you started with a remote MCP server that supports the latest MCP standards and is the reason why thousands of remote MCP servers have been deployed on Cloudflare, including ones from companies like Atlassian, Linear, PayPal, and more.

But deploying servers is only half of the equation — we also wanted to make it just as easy to build and deploy remote MCP clients that can connect to these servers to enable new AI-powered service integrations. That's why we built use-mcp, a React library for connecting to remote MCP servers, and we're excited to contribute it to the MCP ecosystem to enable more developers to build remote MCP clients.

Today, we're open-sourcing two tools that make it easy to build and deploy MCP clients:

use-mcp — A React library that connects to any remote MCP server in just 3 lines of code, with transport, authentication, and session management automatically handled. We're excited to contribute this library to the MCP ecosystem to enable more developers to build remote MCP clients.
The AI Playground — Cloudflare’s AI chat interface platform that uses a number of LLM models to interact with remote MCP servers, with support for the latest MCP standard, which you can now deploy yourself.

Whether you're building an AI-powered chat bot, a support agent, or an internal company interface, you can leverage these tools to connect your AI agents and applications to external services via MCP.

Ready to get started? Click on the button below to deploy your own instance of Cloudflare’s AI Playground to see it in action.

use-mcp: a React library for building remote MCP clients

use-mcp is a React library that abstracts away all the complexity of building MCP clients. Add the useMCP() hook into any React application to connect to remote MCP servers that users can interact with.

Here’s all the code you need to add to connect to a remote MCP server:

Just specify the URL, and you're instantly connected.

Behind the scenes, use-mcp handles the transport protocols (both Streamable HTTP and Server-Sent Events), authentication flows, and session management. It also includes a number of features to help you build reliable, scalable, and production-ready MCP clients.

Connection management

Network reliability shouldn’t impact user experience. use-mcp manages connection retries and reconnections with a backoff schedule to ensure your client can recover the connection during a network issue and continue where it left off. The hook exposes real-time connection states ("connecting", "ready", "failed"), allowing you to build responsive UIs that keep users informed without requiring you to write any custom connection handling logic.

Authentication & authorization

Many MCP servers require some form of authentication in order to make tool calls. use-mcp supports OAuth 2.1 and handles the entire OAuth flow. It redirects users to the login page, allows them to grant access, securely stores the access token returned by the OAuth provider, and uses it for all subsequent requests to the server. The library also provides methods for users to revoke access and clear stored credentials. This gives you a complete authentication system that allows you to securely connect to remote MCP servers, without writing any of the logic.

Dynamic tool discovery

When you connect to an MCP server, use-mcp fetches the tools it exposes. If the server adds new capabilities, your app will see them without any code changes. Each tool provides type-safe metadata about its required inputs and functionality, so your client can automatically validate user input and make the right tool calls.

Debugging & monitoring capabilities

To help you troubleshoot MCP integrations, use-mcp exposes a log array containing structured messages at debug, info, warn, and error levels, with timestamps for each one. You can enable detailed logging with the debug option to track tool calls, authentication flows, connection state changes, and errors. This real-time visibility makes it easier to diagnose issues during development and production.

Future-proofed & backwards compatible

MCP is evolving rapidly, with recent updates to transport mechanisms and upcoming changes to authorization. use-mcp supports both Server-Sent Events (SSE) and the newer Streamable HTTP transport, automatically detecting and upgrading to newer protocols, when supported by the MCP server.

As the MCP specification continues to evolve, we'll keep the library updated with the latest standards, while maintaining backwards compatibility. We are also excited to contribute use-mcp to the MCP project, so it can grow with help from the wider community.

MCP Inspector, built with use-mcp

In use-mcp’s examples directory, you’ll see a minimal MCP Inspector that was built with the use-mcp hook. . Enter any MCP server URL to test connections, see available tools, and monitor interactions through the debug logs. It's a great starting point for building your own MCP clients or something you can use to debug connections to your MCP server.

Open-sourcing the AI Playground

We initially built the AI Playground to give users a chat interface for testing different AI models supported by Workers AI. We then added MCP support, so it could be used as a remote MCP client to connect to and test MCP servers. Today, we're open-sourcing the playground, giving you the complete chat interface with the MCP client built in, so you can deploy it yourself and customize it to fit your needs.

The playground comes with built-in support for the latest MCP standards, including both Streamable HTTP and Server-Sent Events transport methods, OAuth authentication flows that allow users to sign-in and grant permissions, as well as support for bearer token authentication for direct MCP server connections.

How the AI Playground works

The AI Playground is built on Workers AI, giving you access to a full catalog of large language models (LLMs) running on Cloudflare's network, combined with the Agents SDK and use-mcp library for MCP server connections.

The AI Playground uses the use-mcp library to manage connections to remote MCP servers. When the playground starts up, it initializes the MCP connection system with const{tools: mcpTools} = useMcp(), which provides access to all tools from connected servers. At first, this list is empty because it’s not connected to any MCP servers, but once a connection to a remote MCP server is established, the tools are automatically discovered and populated into the list.

Once connected, the playground immediately has access to any tools that the MCP server exposes. The use-mcp library handles all the protocol communication and tool discovery, and maintains the connection state. If the MCP server requires authentication, the playground handles OAuth flows through a dedicated callback page that uses onMcpAuthorization from use-mcp to complete the authentication process.

When a user sends a chat message, the playground takes the mcpTools from the use-mcp hook and passes them directly to Workers AI, enabling the model to understand what capabilities are available and invoke them as needed.

Debugging and monitoring

To monitor and debug connections to MCP servers, we’ve added a Debug Log interface to the playground. This displays real-time information about the MCP server connections, including connection status, authentication state, and any connection errors.

During the chat interactions, the debug interface will show the raw message exchanged between the playground and the MCP server, including the tool invocation and its result. This allows you to monitor the JSON payload being sent to the MCP server, the raw response returned, and track whether the tool call succeeded or failed. This is especially helpful for anyone building remote MCP servers, as it allows you to see how your tools are behaving when integrated with different language models.

Contributing to the MCP ecosystem

One of the reasons why MCP has evolved so quickly is that it's an open source project, powered by the community. We're excited to contribute the use-mcp library to the MCP ecosystem to enable more developers to build remote MCP clients.

If you're looking for examples of MCP clients or MCP servers to get started with, check out the Cloudflare AI GitHub repository for working examples you can deploy and modify. This includes the complete AI Playground source code, a number of remote MCP servers that use different authentication & authorization providers, and the MCP Inspector.

We’re also building the Cloudflare MCP servers in public and welcome contributions to help make them better.

Whether you're building your first MCP server, integrating MCP into an existing application, or contributing to the broader ecosystem, we'd love to hear from you. If you have any questions, feedback, or ideas for collaboration, you can reach us via email at 1800-mcp@cloudflare.com.

Making Cloudflare the best platform for building AI Agents

Rita Kozlov — Tue, 25 Feb 2025 14:00:00 GMT

As engineers, we’re obsessed with efficiency and automating anything we find ourselves doing more than twice. If you’ve ever done this, you know that the happy path is always easy, but the second the inputs get complex, automation becomes really hard. This is because computers have traditionally required extremely specific instructions in order to execute.

The state of AI models available to us today has changed that. We now have access to computers that can reason, and make judgement calls in lieu of specifying every edge case under the sun.

That’s what AI agents are all about.

Today we’re excited to share a few announcements on how we’re making it even easier to build AI agents on Cloudflare, including:

agents-sdk — a new JavaScript framework for building AI agents
Updates to Workers AI: structured outputs, tool calling, and longer context windows for Workers AI, Cloudflare’s serverless inference engine
An update to the workers-ai-provider for the AI SDK

We truly believe that Cloudflare is the ideal platform for building Agents and AI applications (more on why below), and we’re constantly working to make it better — you can expect to see more announcements from us in this space in the future.

Before we dive deep into the announcements, we wanted to give you a quick primer on agents. If you are familiar with agents, feel free to skip ahead.

What are agents?

Agents are AI systems that can autonomously execute tasks by making decisions about tool usage and process flow. Unlike traditional automation that follows predefined paths, agents can dynamically adapt their approach based on context and intermediate results. Agents are also distinct from co-pilots (e.g. traditional chat applications) in that they can fully automate a task, as opposed to simply augmenting and extending human input.

Agents → non-linear, non-deterministic (can change from run to run)
Workflows → linear, deterministic execution paths
Co-pilots → augmentative AI assistance requiring human intervention

Example: booking vacations

If this is your first time working with, or interacting with agents, this example will illustrate how an agent works within a context like booking a vacation.

Imagine you're trying to book a vacation. You need to research flights, find hotels, check restaurant reviews, and keep track of your budget.

Traditional workflow automation

A traditional automation system follows a predetermined sequence: it can take inputs such as dates, location, and budget, and make calls to predefined APIs in a fixed order. However, if any unexpected situations arise, such as flights being sold out, or the specified hotels being unavailable, it cannot adapt.

AI co-pilot

A co-pilot acts as an intelligent assistant that can provide hotel and itinerary recommendations based on your preferences. If you have questions, it can understand and respond to natural language queries and offer guidance and suggestions. However, it is unable to take the next steps to execute the end-to-end action on its own.

Agent

An agent combines AI's ability to make judgements and call the relevant tools to execute the task. An agent's output will be nondeterministic given: real-time availability and pricing changes, dynamic prioritization of constraints, ability to recover from failures, and adaptive decision-making based on intermediate results. In other words, if flights or hotels are unavailable, an agent can reassess and suggest a new itinerary with altered dates or locations, and continue executing your travel booking.

agents-sdk — the framework for building agents

You can now add agent powers to any existing Workers project with just one command:

… or if you want to build something from scratch, you can bootstrap your project with the agents-starter template:

agents-sdk is a framework that allows you to build agents — software that can autonomously execute tasks — and deploy them directly into production on Cloudflare Workers.

Your agent can start with the basics and act on HTTP requests…

Although this is just the initial release of agents-sdk, we wanted to ship more than just a thin wrapper over an existing library. Agents can communicate with clients in real time, persist state, execute long-running tasks on a schedule, send emails, run asynchronous workflows, browse the web, query data from your Postgres database, call AI models, and support human-in-the-loop use-cases. All of this works today, out of the box.

For example, you can build a powerful chat agent with the AIChatAgent class:

… and connect to your Agent with any React-based front-end with the useAgent hook that can automatically establish a bidirectional WebSocket, sync client state, and allow you to build Agent-based applications without a mountain of bespoke code:

We spent some time thinking about the production story here too: an agent framework that absolves itself of the hard parts — durably persisting state, handling long-running tasks & loops, and horizontal scale — is only going to get you so far. Agents built with agents-sdk can be deployed directly to Cloudflare and run on top of Durable Objects — which you can think of as stateful micro-servers that can scale to tens of millions — and are able to run wherever they need to. Close to a user for low-latency, close to your data, and/or anywhere in between.

agents-sdk also exposes:

Integration with React applications via a useAgent hook that can automatically set up a WebSocket connection between your app and an agent
An AIChatAgent extension that makes it easier to build intelligent chat agents
State management APIs via this.setState as well as a native sql API for writing and querying data within each Agent
State synchronization between frontend applications and the agent state
Agent routing, enabling agent-per-user or agent-per-workflow use-cases. Spawn millions (or tens of millions) of agents without having to think about how to make the infrastructure work, provision CPU, or scale out storage.

Over the coming weeks, expect to see even more here: tighter integration with email APIs to enable more human-in-the-loop use-cases, hooks into WebRTC for voice & video interactivity, a built-in evaluation (evals) framework, and the ability to self-host agents on your own infrastructure.

We’re aiming high here: we think this is just the beginning of what agents are capable of, and we think we can make Workers the best place (but not the only place) to build & run them.

JSON mode, longer context windows, and improved tool calling in Workers AI

When users express needs conversationally, tool calling converts these requests into structured formats like JSON that APIs can understand and process, allowing the AI to interact with databases, services, and external systems. This is essential for building agents, as it allows users to express complex intentions in natural language, and AI to decompose these requests, call appropriate tools, evaluate responses and deliver meaningful outcomes.

When using tool calling or building AI agents, the text generation model must respond with valid JSON objects rather than natural language. Today, we're adding JSON mode support to Workers AI, enabling applications to request a structured output response when interacting with AI models. Here's a request to @cf/meta/llama-3.1-8b-instruct-fp8-fast using JSON mode:

And here’s how the model will respond:

As you can see, the model is complying with the JSON schema definition in the request and responding with a validated JSON object. JSON mode is compatible with OpenAI’s response_format implementation:

This is the list of models that now support JSON mode:

We will continue extending this list to keep up with new, and requested models.

Lastly, we are changing how we restrict the size of AI requests to text generation models, moving from byte-counts to token-counts, introducing the concept of context window and raising the limits of the models in our catalog.

In generative AI, the context window is the sum of the number of input, reasoning, and completion or response tokens a model supports. You can now find the context window limit on each model page in our developer documentation and decide which suits your requirements and use case.

JSON mode is also the perfect companion when using function calling. You can use structured JSON outputs with traditional function calling or the Vercel AI SDK via the workers-ai-provider.

workers-ai-provider 0.1.1

One of the most common ways to build with AI tooling today is by using the popular AI SDK. Cloudflare’s provider for the AI SDK makes it easy to use Workers AI the same way you would call any other LLM, directly from your code.

In the most recent version, we’ve shipped the following improvements:

Tool calling enabled for generateText
Streaming now works out of the box
Usage statistics are now enabled
You can now use AI Gateway, even when streaming

A key part of building agents is using LLMs for routing, and making decisions on which tools to call next, and summarizing structured and unstructured data. All of these things need to happen quickly, as they are on the critical path of the user-facing experience.

Workers AI, with its globally distributed fleet of GPUs, is a perfect fit for smaller, low-latency LLMs, so we’re excited to make it easy to use with tools developers are already familiar with.

Why build agents on Cloudflare?

Since launching Workers in 2017, we’ve been building a platform to allow developers to build applications that are fast, scalable, and cost-efficient from day one. We took a fundamentally different approach from the way code was previously run on servers, making a bet about what the future of applications was going to look like — isolates running on a global network, in a way that was truly serverless. No regions, no concurrency management, no managing or scaling infrastructure.

The release of Workers was just the beginning, and we continued shipping primitives to extend what developers could build. Some more familiar, like a key-value store (Workers KV), and some that we thought would play a role in enabling net new use cases like Durable Objects. While we didn’t quite predict AI agents (though “Agents” was one of the proposed names for Durable Objects), we inadvertently created the perfect platform for building them.

What do we mean by that?

A platform that only charges you for what you use (regardless of how long it takes)

To be able to run agents efficiently, you need a system that can seamlessly scale up and down to support the constant stop, go, wait patterns. Agents are basically long-running tasks, sometimes waiting on slow reasoning LLMs and external tools to execute. With Cloudflare, you don’t have to pay for long-running processes when your code is not executing. Cloudflare Workers is designed to scale down and only charge you for CPU time, as opposed to wall-clock time.

In many cases, especially when calling LLMs, the difference can be in orders of magnitude — e.g. 2–3 milliseconds of CPU vs. 10 seconds of wall-clock time. When building on Workers, we pass that difference on to you as cost savings.

Serverless AI Inference

We took a similar serverless approach when it comes to inference itself. When you need to call an AI model, you need it to be instantaneously available. While the foundation model providers offer APIs that make it possible to just call the LLM, if you’re running open-source models, LoRAs, or self-trained models, most cloud providers today require you to pre-provision resources for what your peak traffic will look like. This means that the rest of the time, you’re still paying for GPUs to sit there idle. With Workers AI, you can pay only when you’re calling our inference APIs, as opposed to unused infrastructure. In fact, you don’t have to think about infrastructure at all, which is the principle at the core of everything we do.

A platform designed for durable execution

Durable Objects and Workflows provide a robust programming model that ensures guaranteed execution for asynchronous tasks that require persistence and reliability. This makes them ideal for handling complex operations like long-running deep thinking LLM calls, human-in-the-loop approval processes, or interactions with unreliable third-party APIs. By maintaining state across requests and automatically handling retries, these tools create a resilient foundation for building sophisticated AI agents that can perform complex, multistep tasks without losing context or progress, even when operations take significant time to complete.

Lastly, new and updated agents documentation

Did you catch all of that?

No worries if not: we’ve updated our agents documentation to include everything we talked about above, from breaking down the basics of agents, to showing you how to tackle foundational examples of building with agents.

We’ve also updated our Workers prompt with knowledge of the agents-sdk library, so you can use Cursor, Windsurf, Zed, ChatGPT or Claude to help you build AI Agents and deploy them to Cloudflare.

Can’t wait to see what you build!

We’re just getting started, and we love to see all that you build. Please join our Discord, ask questions, and tell us what you’re building.

Cloudflare acquires PartyKit to allow developers to build real-time multi-user applications

Sunil Pai — Fri, 05 Apr 2024 13:00:56 GMT

We're thrilled to announce that PartyKit, an open source platform for deploying real-time, collaborative, multiplayer applications, is now a part of Cloudflare. This acquisition marks a significant milestone in our journey to redefine the boundaries of serverless computing, making it more dynamic, interactive, and, importantly, stateful.

Defining the future of serverless compute around state

Building real-time applications on the web have always been difficult. Not only is it a distributed systems problem, but you need to provision and manage infrastructure, databases, and other services to maintain state across multiple clients. This complexity has traditionally been a barrier to entry for many developers, especially those who are just starting out.

We announced Durable Objects in 2020 as a way of building synchronized real time experiences for the web. Unlike regular serverless functions that are ephemeral and stateless, Durable Objects are stateful, allowing developers to build applications that maintain state across requests. They also act as an ideal synchronization point for building real-time applications that need to maintain state across multiple clients. Combined with WebSockets, Durable Objects can be used to build a wide range of applications, from multiplayer games to collaborative drawing tools.

In 2022, PartyKit began as a project to further explore the capabilities of Durable Objects and make them more accessible to developers by exposing them through familiar components. In seconds, you could create a project that configured behavior for these objects, and deploy it to Cloudflare. By integrating with popular libraries such as Yjs (the gold standard in collaborative editing) and React, PartyKit made it possible for developers to build a wide range of use cases, from multiplayer games to collaborative drawing tools, into their applications.

Building experiences with real-time components was previously only accessible to multi-billion dollar companies, but new computing primitives like Durable Objects on the edge make this accessible to regular developers and teams. With PartyKit now under our roof, we're doubling down on our commitment to this future — a future where serverless is stateful.

We’re excited to give you a preview into our shared vision for applications, and the use cases we’re excited to simplify together.

Making state for serverless easy

Unlike conventional approaches that rely on external databases to maintain state, thereby complicating scalability and increasing costs, PartyKit leverages Cloudflare's Durable Objects to offer a seamless model where stateful serverless functions can operate as if they were running on a single machine, maintaining state across requests. This innovation not only simplifies development but also opens up a broader range of use cases, including real-time computing, collaborative editing, and multiplayer gaming, by allowing thousands of these "machines" to be spun up globally, each maintaining its own state. PartyKit aims to be a complement to traditional serverless computing, providing a more intuitive and efficient method for developing applications that require stateful behavior, thereby marking the "next evolution" of serverless computing.

Simplifying WebSockets for Real-Time Interaction

WebSockets have revolutionized how we think about bidirectional communication on the web. Yet, the challenge has always been about scaling these interactions to millions without a hitch. Cloudflare Workers step in as the hero, providing a serverless framework that makes real-time applications like chat services, multiplayer games, and collaborative tools not just possible but scalable and efficient.

Powering Games and Multiplayer Applications Without Limits

Imagine building multiplayer platforms where the game never lags, the collaboration is seamless, and video conferences are crystal clear. Cloudflare's Durable Objects morph the stateless serverless landscape into a realm where persistent connections thrive. PartyKit's integration into this ecosystem means developers now have a powerhouse toolkit to bring ambitious multiplayer visions to life, without the traditional overheads.

This is especially critical in gaming — there are few areas where low-latency and real-time interaction matter more. Every millisecond, every lag, every delay defines the entire experience. With PartyKit's capabilities integrated into Cloudflare, developers will be able to leverage our combined technologies to create gaming experiences that are not just about playing but living the game, thanks to scalable, immersive, and interactive platforms.

The toolkit for building Local-First applications

The Internet is great, and increasingly always available, but there are still a few situations where we are forced to disconnect — whether on a plane, a train, or a beach.

The premise of local-first applications is that work doesn't stop when the Internet does. Wherever you left off in your doc, you can keep working on it, assuming the state will be restored when you come back online. By storing data on the client and syncing when back online, these applications offer resilience and responsiveness that's unmatched. Cloudflare's vision, enhanced by PartyKit's technology, aims to make local-first not just an option but the standard for application development.

What's next for PartyKit users?

Users can expect their existing projects to continue working as expected. We will be adding more features to the platform, including the ability to create and use PartyKit projects inside existing Workers and Pages projects. There will be no extra charges to use PartyKit for commercial purposes, other than the standard usage charges for Cloudflare Workers and other services. Further, we're going to expand the roadmap to begin working on integrations with popular frameworks and libraries, such as React, Vue, and Angular. We're deeply committed to executing on the PartyKit vision and roadmap, and we're excited to see what you build with it.

The Beginning of a New Chapter

The acquisition of PartyKit by Cloudflare isn't just a milestone for our two teams; it's a leap forward for developers everywhere. Together, we're not just building tools; we're crafting the foundation for the next generation of Internet applications. The future of serverless is stateful, and with PartyKit's expertise now part of our arsenal, we're more ready than ever to make that future a reality.

Welcome to the Cloudflare team, PartyKit. Look forward to building something remarkable together.

10 things I love about Wrangler v2.0

Sunil Pai — Mon, 09 May 2022 13:00:07 GMT

Last November, we announced the beta release of a full rewrite of Wrangler, our CLI for building Cloudflare Workers. Since then, we’ve been working round the clock to make sure it's feature complete, bug-free, and easy to use. We are proud to announce that Wrangler goes public today for general usage, and can’t wait to see what people build with it!

Rewrites can be scary. Our goal for this version of Wrangler was backward compatibility with the original version, while significantly improving the developer experience. I'd like to take this opportunity to present 10 reasons why you should upgrade to the new Wrangler!

1. It's simpler to install:

A simpler way to get started.

Previously, folks would have to install @cloudflare/wrangler globally on a system. This made it hard to use different versions of Wrangler across projects. Further, it was hard to install on some CI systems because of lack of access to a user's root folder. Sometimes, folks would forget to add the @cloudflare scope when installing, confusing them when a completely unrelated package was installed and didn't work as expected.

Let's fix that. We've simplified this by now publishing to the wrangler package, so you can run npm install wrangler and it works as expected. You can also install it locally to a project's package.json, like any other regular npm package. It also works across a much broader range of CPU architectures and operating systems, so you can use it on more machines.

This makes it a lot more convenient when starting. But why stop there?

2. Zero config startup:

Get started with zero configuration

It's now much simpler to get started with a new project. Previously, you would have to create a wrangler.toml configuration file, and fill it in with details about your cloudflare account, how the project was structured, setting up a custom build process and so on. We heard feedback from many of you who would get frustrated during this step, and how it would take many minutes before you could get to developing a simple Worker.

Let's fix that. You no longer need to create a configuration file when starting, and none of the fields are mandatory. Wrangler infers details about your account and project as you start developing, and you can add configuration incrementally when you need to.

In fact, you don't even need to install Wrangler to start! You can create a Worker (say, as index.js) and use npx (a utility that comes installed with node.js) to fetch Wrangler from the npm registry and start developing immediately!

This is great for extremely simple Workers, but why stop there?

3. `wrangler init my-worker -y`, one liner to set up a full project:

We noticed users would struggle to set up a project with Wrangler, even after they'd installed Wrangler and configured wrangler.toml. Most users want to set up a package.json, commonly use typescript to write code, and set up git to track changes in this project. So, we expanded the wrangler init command to set up a production grade project. You can optionally choose to use typescript, install the official type definitions for Workers, and use git to track changes.

My favorite trick here is to pass -y to accept all questions without asking. Try running npx wrangler init my-worker -y in your terminal today!

One line to set up a full Workers project

4. `--local` mode:

Wrangler typically runs a development server on our global network, setting up a local proxy when developing, so you can develop against a "real" environment in the cloud. This is great for making sure the code you develop will behave the same in development, and after you deploy it to production. The trade-off here is it's harder to develop code when you have a bad Internet connection, or if you're running tests on a CI machine. It's also marginally slower to iterate while coding. Users have asked us for a long time to be able to 'run' their code locally on their machines, so that they can iterate quickly and run tests in environments like CI.

Wrangler now lets you develop on your machine by simply calling wrangler dev --local, and no additional configuration. This is powered by Miniflare, a fully featured simulator of the Cloudflare Workers runtime. You can even toggle across 'edge' and 'local' modes by tapping the 'L' hotkey when developing; however you prefer!

Local mode, powered by Miniflare.

5. Tail any Worker, any time:

Tail your logs anywhere, anytime.

It's useful to be able to "tail" a Worker's output to a terminal, and see what's going on in real time. While you can already view these logs in the Workers dashboard, some people are more comfortable seeing the logs in their terminal, and then slicing and dicing to debug any issues that may be occuring. Previously, you would have to checkout a Worker's repository locally, install dependencies, and then call wrangler tail in the project folder. This was clearly cumbersome, and relied on developer expertise to see something as simple as a Worker's logs.

Now you can simply call npx wrangler tail in your terminal, without any configuration or setup, and immediately see the logs that you expect. We use this ourselves to quickly inspect our production Workers and see what's going on inside them!

6. Better warnings and errors, everywhere:

One of the worst feelings a developer can face is being presented with an error when writing code, and not knowing how to fix it and proceed. We heard feedback from many of you who were frustrated with the lack of error messages, and how you would spend hours trying to figure out what went wrong. We've now added new error and warning messages, so you can easily spot the problems in your code. When possible, we also include steps you can follow to fix your Worker, including things that you can simply copy and paste! This makes Wrangler much more friendly to use, and we promise the experience will only get better.

7. On-demand developer tools for debugging:

We introduced initial support for debugging Workers in Wrangler in September which enables debugging a Worker directly on our global network. However, getting started with debugging was still a bit cumbersome, because you would have to start Wrangler with an --inspect flag, then open a special page in your browser (chrome://inspect), configuring it to detect Wrangler running on a special port, and then launching the debugger. This would also mean you might have lost any debugging messages that were logged before you opened the Chrome developer tools.

We fixed this. Now you don't need to pass any special flags when starting up. You can simply hit the D hotkey when developing and a developer tools instance pops up in your browser. And by buffering the messages before you even start up the devtools, you don't lose any logs or errors! You can also use VS Code developer tools to directly hook into your Worker's debugging session!

8. A modern module system:

Modern JavaScript isn't simply about the syntax that the language supports, but also writing code as modules, and leveraging the extremely broad ecosystem of community libraries and frameworks. Previously, Wrangler required that you set up webpack or a custom build with bundlers (like rollup, vite, or esbuild, to name a few) to consume libraries and modules. This introduces a lot of friction, especially when starting a new project and trying out new ideas.

Now, support for npm modules comes out of the box, with no extra configuration required! You can install any package from the npm registry, organize your own code with modules, and it all works as expected. We're also introducing an experimental node.js compatibility mode for using node.js modules that wouldn't previously work without setting up your own polyfills! This means you can use popular frameworks and libraries that you're already familiar with while focusing on delivering value to your users.

9. Closes many outstanding issues:

A rewrite should be judged not just by the new features that are implemented, but by how many existing issues are resolved. We went through hundreds of outstanding issues and bugs with Wrangler, and are happy to say that we solved almost all of them! Across the board, every command and feature got a facelift, bug fixes, and test coverage to make sure it doesn't break in the future. Developers on using Cloudflare Workers will be glad to hear that simply upgrading Wrangler will immediately fix previous concerns and problems. Which leads us to my absolute favorite feature...

10. A commitment to improve:

The effort into building wrangler v2.0, visualised.

Wrangler has always been special software for us. It represents the primary interface that developers use to interact and use Cloudflare Workers, and we have major plans for the future. We have invested time, effort and resources to make sure Wrangler is the best tool for developers to use, and we're excited to see what the future holds. This is a commitment to our users and community that we will only keep improving on this foundation, and folks can expect their feedback and concerns to be heard loud and clear.

wrangler 2.0 — a new developer experience for Cloudflare Workers

Ashcon Partovi — Tue, 16 Nov 2021 13:59:22 GMT

Much of a developer’s work is about making trade-offs: consistency versus availability, speed over correctness, you name it. While there will always be different technical trade-offs to consider, we believe there are some that you, as a developer, should never need to make.

One of those decisions is an easy-to-use development environment. Whether you’re onboarding a new developer to your team or you simply want to develop faster, it’s important that even the smallest of things are optimized for speed and simplicity.

That’s why we're excited to announce the second-generation of our developer tooling for Cloudflare Workers. It’s a new developer experience that’s out-of-the-box, lightning fast, and can even run Workers on a local machine. (Yes!)

If you’re already familiar with our existing tools, we’re not just talking about the wrangler CLI, we’re talking about its next major release: wrangler 2.0. Stick around to get a sneak-peak at the new experience.

No config? No problem

We’ve made it much easier to get started with Cloudflare Workers. All you need is a single JavaScript file to run a Worker -- no configuration needed. You don't even need to decide on a name!

When you run wrangler dev , your code is automatically bundled and deployed to a development environment. Then, you can send HTTP requests to that environment using a localhost proxy. Here’s what that looks like in-action:

Live debugging, just like that

Now there’s a completely redesigned experience for debugging a Worker. With a simple command you get access to a remote debugger, the same used by Chrome when you click “Inspect,” which provides an interactive view of logs, exceptions, and requests. It feels like your Worker is running locally, yet it’s actually running on the Cloudflare network.

A debugger that “just works” and auto-detects your changes makes all the difference when you’re just trying to code. We’ve also made a number of improvements to make the debugging experience even easier:

Keybind shortcuts, to quickly toggle features or open a window.
Support for “--public ”, to automatically serve your static assets.
Faster and more reliable file-watching.

To start a debugging session, just run: wrangler dev , then hit the “D” keybind.

Local mode? Flip a switch

Another aspect of the new debugging experience is the ability to switch to “local mode,” which runs your Worker on your local machine. In fact, you can easily switch between “network” and “local” mode with just a keybind shortcut.

How does this work? Recently, we announced that Miniflare (created by Brendan Coll), a project to locally emulate Workers in Node.js, has joined the Cloudflare organization. Miniflare is great for unit testing and situations where you’d like to debug Workers without an Internet connection. Now we’ve integrated it directly into the local development experience, so you can enjoy the benefits of both the network and your localhost!

Let us know what you think!

Serverless should be simple. We’re really excited about these improvements to the developer experience for Workers, and we have a lot more planned.

While we’re still working on wrangler 2.0, you can try the beta release by running: npm install wrangler@beta or by visiting the repository to see what we’re working on. If you’re already using wrangler to deploy existing applications, we recommend continuing to use wrangler 1.0 until the 2.0 release is officially out. We will continue to develop and maintain wrangler 1.0 until we’re finished with backwards-compatibility for 2.0.

If you’re starting a project or want to try out the new experience, we’d love to hear your feedback! Let us know what we’re missing or what you’d like to see in wrangler 2.0. You can create a feature request or start a discussion in the repository. (we’ll merge them into the existing wrangler repository when 2.0 is out of beta).

Thank you to all of our developers out there, and we look forward to seeing what you build!

Sunil Pai - The Cloudflare Blog

Project Think: building the next generation of AI agents on Cloudflare

Introducing Project Think

Long-running agents

Surviving crashes: durable execution with fibers

Delegating work: sub-agents via Facets

Conversations that persist: the Session API

From tool calls to code execution

The missing primitive: safe sandboxes

The execution ladder

Building blocks, not a framework

The platform

The Think base class

Lifecycle hooks

Persistent memory and long conversations

The full execution ladder, wired in

Self-authored extensions

Sub-agent RPC

Getting started

The third wave

Add voice to your agent

Get started with voice

How the voice pipeline works

Why voice should grow with the rest of your agent

Voice and text share the same state

Lower latency comes from...

a shorter network path

built-in streaming

A more realistic backend

Voice as an input: withVoiceInput

Voice and text on the same connection

What else can you build?

Tools and scheduling

Runtime model switching

Pipeline hooks

Telephone and transport options

Phone calls via Twilio

WebRTC

Build with us

Try it now

Sandboxing AI agents, 100x faster

And we have it.

Dynamic Worker Loader: a lean sandbox

100x faster

Unlimited scalability

Zero latency

It's all JavaScript

Tools defined in TypeScript

HTTP filtering and credential injection

Battle-hardened security

Helper libraries

Code Mode

Bundling

File manipulation

How are people using it?

Code Mode

Building custom automations

Running AI-generated applications

Pricing

Get Started

Dynamic Workers Starter

Dynamic Workers Playground

Code Mode: the better way to use MCP

What's MCP?

Anatomy of a tool call

What's wrong with this?

But MCP is still useful, because it is uniform

OK, how does it work?

Converting MCP to TypeScript

Running code in a sandbox

Dynamic Worker loading: no containers here

The Worker Loader API

Workers are better sandboxes

Faster, cheaper, disposable sandboxes

Isolated by default, but connected with bindings

No API keys to leak

Try it now!

Sign up for the production beta

Or try it locally

Building agents with OpenAI and Cloudflare’s Agents SDK

Voice as an input: `withVoiceInput`

3. Agents are composable — and that’s powerful

3. `wrangler init my-worker -y`, one liner to set up a full project:

4. `--local` mode: