Posts tagged "MCP"

Bringing more agent harnesses and frameworks to Cloudflare, starting with Flue

Thomas Gauvin — Wed, 17 Jun 2026 19:35:00 GMT

2026 is the year agent harnesses go to production. The software that controls the model’s access to the outside world — harnesses like Codex, Claude Code, OpenCode, Pi, and Project Think — has matured to the point where teams are deploying agents as real, load-bearing infrastructure, not just prototypes.

But building agents that survive production is hard.

We learned this firsthand building Project Think as our first-party agent harness. In working with our customers to run agents in production, we found a common set of distributed systems problems that every agent faces when running in the cloud. When an agent is interrupted, how can it automatically and gracefully resume from where it left off, without losing context or wasting tokens? How can agents run untrusted code securely? How can agents use the tools they were trained for?

A harness can’t solve these problems on its own. They’re tied to state, storage and compute — which means they’re dependent on the platform the agent runs on. That’s why we’re taking our learnings from hardening Project Think for production and bringing them to the Cloudflare Agents SDK as a base layer. Durable execution, dynamic code execution, a durable filesystem and dynamic workflows, now available to any harness building on Agents SDK.

At the same time, a new layer has emerged above the harness. Frameworks like Flue wrap a harness with the project structures, conventions, integrations and developer experience that make agents productive to build.

To solve these scaling challenges, there’s a new, three-layer stack that is emerging for building production-grade AI. Here is how the pieces fit together, moving from the user-facing developer experience down to the underlying platform primitives:

The framework (Flue) — the project structure, the conventions, the integrations, the CLI and the developer experience for building agents.
The harness (Pi, Project Think) — the agentic loop that calls tools, reads results, manages context and keeps going until the task is done.
The runtime/platform (the Cloudflare Agents SDK) — the compute, state, and storage primitives everything above depends on

The Agents SDK is that bottom layer: it makes primitives like durable execution available to any harness and any framework. Flue, our new open-source framework from the team behind Astro, is the first to build on it. Here’s how.

Flue

Flue shipped 1.0 Beta this week, built on the Pi harness, the same harness that OpenClaw is built on. What makes it different as an agent framework is the approach: you don’t script what your agent does, you describe what it knows. Define the context an agent needs — its model, skills, sandbox, and instructions — and it solves whatever task you give it, autonomously. There’s no orchestration loop to write.

This declarative model is what makes writing agents easy: here’s a triage agent that intercepts a bug report, reproduces it in a sandbox, and diagnoses the issue in under 25 lines.

The Flue developer experience

Flue’s power comes from the fact that agents don’t live in isolation. They are built to exist where your users already work, and integrate with your preferred tooling:

Anywhere agents: Drop your agents into Slack, GitHub, Linear, or Discord with pre-configured Channels that handle event verification and dispatch boilerplate automatically.
Headless, but UI-ready: Agents shouldn’t live in a black box. Flue agents can run completely headlessly for background tasks, but @flue/react provides native frontend hooks that stream an agent’s state, tool execution, and live messages straight into your frontend application, without you having to build custom real-time plumbing from scratch.
Ecosystem-ready: Flue makes it easy to add and upgrade integrations with commands like flue add channel slack, generating a Markdown blueprint that your own coding agent can read, modify, and cleanly integrate straight into your codebase.

Designed for production, not just prototypes

Moving an agent out of a local terminal and into a production ecosystem introduces traditional distributed systems failures. Host crashes, API timeouts from LLM providers, and unexpected restarts threaten to erase the short-term memory of a running agent turn.

Flue solves this via Durable Streams. Each event in the execution history is added to an append-only log. By processing every prompt, tool response and model choice as an unchangeable ledger, an agent’s state is never volatile. If a process dies, another simply picks up the log and continues from the exact step it left off.

Deploy anywhere, including Cloudflare

Flue is a multi-cloud framework. On Node.js, each agent runs as a long-lived process. You can deploy it to any VM or container, run it in GitHub Actions, or embed it on an existing server. But when you target Cloudflare, each agent becomes a Durable Object.

By running each Flue agent inside its own Durable Object, Cloudflare can automatically scale to as many agents as you need, each with their own isolated storage and compute. You don’t have to provision servers, manage sticky sessions, or worry about noisy neighbors. And when Flue agents are deployed to Cloudflare, they get durable execution using Agents SDK’s runFiber(), stash(), and onFiberRecovered() methods. Flue also uses @cloudflare/codemode and @cloudflare/shell for sandboxed code execution against a durable workspace.

What harnesses need out of an agentic platform

Flue’s Cloudflare target works so effectively because it maps cleanly to the core primitives we built into the Agents SDK. You can even dig into the Flue source code to understand how Pi, the underlying harness, is adapted to work on Cloudflare Agents SDK.

Here’s how Flue leverages the Agents SDK under the hood, and what it takes to run any modern agent harness reliably at scale.

Every agent harness needs durable execution

An agent turn is not a single request. The model streams tokens, calls tools, waits for results, maybe asks a human for approval, or delegates work to a subagent. That sequence can take seconds or minutes, and at any point the process can be interrupted or crash. When that happens, all of the agent state that was in memory is gone: the streaming connection, the pending tool calls, where the agent was in its turn. Sure, the conversation history is persisted on disk, but the user sees a spinner that never resolves. That’s a broken user experience.

Fibers solve this problem by providing a native checkpointing mechanism directly inside the Agent’s underlying Durable Object. runFiber() records the progress to the Durable Object’s SQLite storage before the work in the Agent turn starts and checkpoints with stash() as the turn advances. When a fresh agent instance boots after an interruption, onFiberRecovered() delivers the last checkpoint, so your agent knows a turn was interrupted, where it got to, and can decide how to continue.

Flue uses runFiber() on its Cloudflare target for exactly this. With the onFiberRecovered() hook, your harness can decide how to resume the execution of the turn, whether it attempts a full reconstruction model like Project Think that repairs turn state or whether it replays certain parts of the turn.

Executing code is better than overloading agents with tools

Agent harnesses give models access to the outside world through tools. But tool surfaces grow fast, and models get worse at selecting the right tool as the list gets longer and the context window fills up with tool definitions. A better pattern: give the model one tool that executes code. The model writes a TypeScript function that calls the APIs it needs, and the harness runs it. We wrote about this when we introduced Code Mode.

The question is where that code runs. To run LLM-generated code securely, you need a sandbox. But typical sandboxes would be slow, cost-prohibitive and inefficient to run each tool call. That’s why the Agents SDK provides @cloudflare/codemode, which wraps Dynamic Workers, to execute LLM-generated code in its own Worker isolate with only the bindings you provide.

Code Mode creates a fresh Dynamic Worker for each snippet, runs it, and discards it. Isolates start in under 10ms and $0.002 per load, resulting in drastically faster and cheaper cost of execution than booting a container every time your agent needs to execute a short piece of code. Flue uses @cloudflare/codemode on its Cloudflare target to power its code tool. The agent writes JavaScript against the workspace and runs it with Code Mode.

You don’t need a full container for most workspace tasks

Agent harnesses often need a filesystem, whether it’s to read files, write outputs, search through code and understand diffs. Coding agents in particular live in the filesystem. But if the harness is running in a serverless environment, how can it get a durable filesystem that persists across executions?

The usual answer is a container. That works, but it’s expensive for what agents mostly do. The majority of filesystem operations in an agent turn are text. Consider a review agent that reads files, greps through source code, or perhaps writes a patch. You don’t need a full Linux boot for that.

@cloudflare/shell gives your agent a durable virtual filesystem inside its Durable Object, backed by SQLite. It provides typed file operations — read, write, edit, search, grep, diff — that agent harnesses can use as tools.

Instead of calling individual tools, a Flue agent running on the Cloudflare target writes JavaScript against the workspace virtual file state API. By running more operations within the Durable Object, the agent benefits from the isolate model’s more efficient execution process, entirely avoiding container overhead:

This translates into a faster and more cost-efficient sandbox environment for agents that need to run shell and filesystem operations to get their work done. And for agents that need a full OS, to run npm install, git, or compilers, Cloudflare Containers provides that. We’re also building @cloudflare/workspace, to keep the virtual file system of a given Durable Object in sync with a container’s, allowing for seamless transition from lightweight Workers to a Linux environment only when it needs one.

Dynamic Workflows: let agents write their own workflows to repeat tasks consistently

But what happens when an agent needs to do more than read files or execute single code snippets? What happens when it needs to orchestrate a massive, multi-step pipeline that must repeat consistently over time, like a code review that successfully resolves bugs or a research workflow that produces good results? A harness can’t provide durable multi-step execution on its own. It needs the platform to persist each step, retry failures, and resume after interruptions.

This pattern is gaining traction. Claude Code recently shipped dynamic workflows, where Claude writes a JavaScript script at runtime to hand off work to dozens of subagents, and the runtime executes it durably. @cloudflare/dynamic-workflows provides this for any harness running on the Agents SDK. Your agent generates a workflow at runtime, and the Workflows engine persists each step, retries failures, and can sleep for hours or wait for external events like human approval.

From the Agent class, runWorkflow() connects your agent to the Workflows engine. The agent kicks off the workflow and can go to sleep. The workflow calls back into the agent via RPC to report progress, update state, or request approval. When the workflow finishes, the agent wakes up with the result.

Direct access to the Cloudflare ecosystem

Beyond compute and storage, agent harnesses need access to external capabilities: web browsing, email, memory, search, inference. A harness shouldn't have to integrate each of these separately, manage API keys for each, or worry about credentials leaking through agent-generated code.

The Agent class gives your harness access to the rest of Cloudflare through bindings: AI Gateway for per-agent spend tracking and limits, Browser Run for web automation, Email Service for inbox workflows, Agent Memory for persistent recall, AI Search for retrieval, Containers for workloads that need a full OS, and inference across 14+ model providers. Bindings grant capabilities without exposing credentials: your agent uses them, but the keys never enter agent-generated code.

Bring your agents to the agentic cloud

We know this approach works because it is the exact architectural foundation we used to build Project Think, our first-party agent harness. While Project Think remains our highly optimized, out-of-the-box solution for native Cloudflare agent experiences, the Agents SDK ensures that the broader open-source ecosystem can leverage those exact same battle-tested primitives, including Flue.

If you're building agents today with Flue, you can deploy in just a few clicks to Cloudflare. And if you're building your own agent harness or you’re building an agent framework, target the Agents SDK and get the platform integration for free.

Agents SDK: developers.cloudflare.com/agents
Flue: flueframework.com, npm install @flue/runtime
Think: docs
Cloudflare Community: community.cloudflare.com

The AI engineering stack we built internally — on the platform we ship

Ayush Thakur — Mon, 20 Apr 2026 13:00:00 GMT

In the last 30 days, 93% of Cloudflare’s R&D organization used AI coding tools powered by infrastructure we built on our own platform.

Eleven months ago, we undertook a major project: to truly integrate AI into our engineering stack. We needed to build the internal MCP servers, access layer, and AI tooling necessary for agents to be useful at Cloudflare. We pulled together engineers from across the company to form a tiger team called iMARS (Internal MCP Agent/Server Rollout Squad). The sustained work landed with the Dev Productivity team, who also own much of our internal tooling including CI/CD, build systems, and automation.

Here are some numbers that capture our own agentic AI use over the last 30 days:

3,683 internal users actively using AI coding tools (60% company-wide, 93% across R&D), out of approximately 6,100 total employees
47.95 million AI requests
295 teams are currently utilizing agentic AI tools and coding assistants.
20.18 million AI Gateway requests per month
241.37 billion tokens routed through AI Gateway
51.83 billion tokens processed on Workers AI

The impact on developer velocity internally is clear: we’ve never seen a quarter-to-quarter increase in merge requests to this degree.

As AI tooling adoption has grown the 4-week rolling average has climbed from ~5,600/week to over 8,700. The week of March 23 hit 10,952, nearly double the Q4 baseline.

MCP servers were the starting point, but the team quickly realized we needed to go further: rethink how standards are codified, how code gets reviewed, how engineers onboard, and how changes propagate across thousands of repos.

This post dives deep into what that looked like over the past eleven months and where we ended up. We're publishing now, to close out Agents Week, because the AI engineering stack we built internally runs on the same products we’re shipping and enhancing this week.

The architecture at a glance

The engineer-facing tools layer (OpenCode, Windsurf, and other MCP-compatible clients) include both open-source and third-party coding assistant tools.

Each layer maps to a Cloudflare product or tool we use:

None of this is internal-only infrastructure. Everything (besides Backstage) listed above is a shipping product, and many of them got substantial updates during Agents Week.

We’ll walk through this in three acts:

The platform layer — how authentication, routing, and inference work (AI Gateway, Workers AI, MCP Portal, Code Mode)
The knowledge layer — how agents understand our systems (Backstage, AGENTS.md)
The enforcement layer — how we keep quality high at scale (AI Code Reviewer, Engineering Codex)

Act 1: The platform layer

How AI Gateway helped us stay secure and improve the developer experience

When you have over 3,600+ internal users using AI coding tools daily, you need to solve for access and visibility across many clients, use cases, and roles.

Everything starts with Cloudflare Access, which handles all authentication and zero-trust policy enforcement. Once authenticated, every LLM request routes through AI Gateway. This gives us a single place to manage provider keys, cost tracking, and data retention policies.

^{The OpenCode AI Gateway overview: 688.46k requests per day, 10.57B tokens per day, routing to four providers through one endpoint.}

AI Gateway analytics show how monthly usage is distributed across model providers. Over the last month, internal request volume broke down as follows.

Frontier models handle the bulk of complex agentic coding work for now, but Workers AI is already a significant part of the mix and handles an increasing share of our agentic engineering workloads.

How we increasingly leverage Workers AI

Workers AI is Cloudflare's serverless AI inference platform which runs open-source models on GPUs across our global network. Beyond huge cost improvements compared to frontier models, a key advantage is that inference stays on the same network as your Workers, Durable Objects, and storage. No cross-cloud hops to deal with, which cause more latency, network flakiness, and additional networking configuration to manage.

^{Workers AI usage in the last month: 51.47B input tokens, 361.12M output tokens.}

Kimi K2.5, launched on Workers AI in March 2026, is a frontier-scale open-source model with a 256k context window, tool calling, and structured outputs. As we described in our Kimi K2.5 launch post, we have a security agent that processes over 7 billion tokens per day on Kimi. That would cost an estimated $2.4M per year on a mid-tier proprietary model. But on Workers AI, it's 77% cheaper.

Beyond security, we use Workers AI for documentation review in our CI pipeline, for generating AGENTS.md context files across thousands of repositories, and for lightweight inference tasks where same-network latency matters more than peak model capability.

As open-source models continue to improve, we expect Workers AI to handle a growing share of our internal workloads.

One thing we got right early: routing through a single proxy Worker from day one. We could have had clients connect directly to AI Gateway, which would have been simpler to set up initially. But centralizing through a Worker meant we could add per-user attribution, model catalog management, and permission enforcement later without touching any client configs. Every feature described in the bootstrap section below exists because we had that single choke point. The proxy pattern gives you a control plane that direct connections don't, and if we plug in additional coding assistant tools later, the same Worker and discovery endpoint will handle them.

How it works: one URL to configure everything

The entire setup starts with one command:

That command triggers a chain that configures providers, models, MCP servers, agents, commands, and permissions, without the user touching a config file.

Step 1: Discover auth requirements. OpenCode fetches config from a URL like https://opencode.internal.domain/.well-known/opencode.

This discovery endpoint is served by a Worker and the response has an auth block telling OpenCode how to authenticate, along with a config block with providers, MCP servers, agents, commands, and default permissions:

Step 2: Authenticate via Cloudflare Access. OpenCode runs the auth command and the user authenticates through the same SSO they use for everything else at Cloudflare. cloudflared returns a signed JWT. OpenCode stores it locally and automatically attaches it to every subsequent provider request.

Step 3: Config is merged into OpenCode. The config provided is shared defaults for the entire organization, but local configs always take priority. Users can override the default model, add their own agents, or adjust project and user scoped permissions without affecting anyone else.

Inside the proxy Worker. The Worker is a simple Hono app that does three things:

Serves the shared config. The config is compiled at deploy time from structured source files and contains placeholder values like {baseURL} for the Worker's origin. At request time, the Worker replaces these, so all provider requests route through the Worker rather than directly to model providers. Each provider gets a path prefix (/anthropic, /openai, /google-ai-studio/v1beta, /compat for Workers AI) that the Worker forwards to the corresponding AI Gateway route.
Proxies requests to AI Gateway. When OpenCode sends a request like POST /anthropic/v1/messages, the Worker validates the Cloudflare Access JWT, then rewrites headers before forwarding:

The request goes to AI Gateway, which routes it to the appropriate provider. The response passes straight through with zero buffering. The apiKey field in the client config is empty because the Worker injects the real key server-side. No API keys exist on user machines.
Keeps the model catalog fresh. An hourly cron trigger fetches the current OpenAI model list from models.dev, caches it in Workers KV, and injects store: false on every model for Zero Data Retention. New models get ZDR automatically without a config redeploy.

Anonymous user tracking. After JWT validation, the Worker maps the user's email to a UUID using D1 for persistent storage and KV as a read cache. AI Gateway only ever sees the anonymous UUID in cf-aig-metadata, never the email. This gives us per-user cost tracking and usage analytics without exposing identities to model providers or Gateway logs.

Config-as-code. Agents and commands are authored as markdown files with YAML frontmatter. A build script compiles them into a single JSON config validated against the OpenCode JSON schema. Every new session picks up the latest version automatically.

The overall architecture is simple and easy for anyone to deploy with our developer platform: a proxy Worker, Cloudflare Access, AI Gateway, and a client-accessible discovery endpoint that configures everything automatically. Users run one command and they're done. There’s nothing for them to configure manually, no API keys on laptops or MCP server connections to manually set up. Making changes to our agentic tools and updating what 3,000+ people get in their coding environment is just a wrangler deploy away.

The MCP Server Portal: one OAuth, multiple MCP tools

We described our full approach to governing MCP at enterprise scale in a separate post, including how we use MCP Server Portals, Cloudflare Access, and Code Mode together. Here's the short version of what we built internally.

Our internal portal aggregates 13 production MCP servers exposing 182+ tools across Backstage, GitLab, Jira, Sentry, Elasticsearch, Prometheus, Google Workspace, our internal Release Manager, and more. This unifies access and simplifies everything giving us one endpoint and one Cloudflare Access flow governing access to every tool.

Each MCP server is built on the same foundation: McpAgent from the Agents SDK, workers-oauth-provider for OAuth, and Cloudflare Access for identity. The whole thing lives in a single monorepo with shared auth infrastructure, Bazel builds, CI/CD pipelines, and catalog-info.yaml for Backstage registration. Adding a new server is mostly copying an existing one and changing the API it wraps. For more on how this works and the security architecture behind it, see our enterprise MCP reference architecture.

Code Mode at the portal layer

MCP is the right protocol for connecting AI agents to tools, but it has a practical problem: every tool definition consumes context window tokens before the model even starts working. As the number of MCP servers and tools grows, so does the token overhead, and at scale, this becomes a real cost. Code Mode is the emerging fix: instead of loading every tool schema up front, the model discovers and calls tools through code.

Our GitLab MCP server originally exposed 34 individual tools (get_merge_request, list_pipelines, get_file_content, and so on). Those 34 tool schemas consumed roughly 15,000 tokens of context window per request. On a 200K context window, that's 7.5% of the budget gone before asking a question. Multiplied across every request, every engineer, every day, it adds up.

MCP Server Portals now support Code Mode proxying, which lets us solve that problem centrally instead of one server at a time. Rather than exposing every upstream tool definition to the client, the portal collapses them into two portal-level tools: portal_codemode_search and portal_codemode_execute.

The nice thing about doing this at the portal layer is that it scales cleanly. Without Code Mode, every new MCP server adds more schema overhead to every request. With portal-level Code Mode, the client still only sees two tools even as we connect more servers behind the portal. That means less context bloat, lower token cost, and a cleaner architecture overall.

Act 2: The knowledge layer

Backstage: the knowledge graph underneath all of it

Before the iMARS team could build MCP servers that were actually useful, we needed to solve a more fundamental problem: structured data about our services and infrastructure. We need our agents to understand context outside the code base, like who owns what, how services depend on each other, where the documentation lives, and what databases a service talks to.

We run Backstage, the open-source internal developer portal originally built by Spotify, as our service catalog. It's self-hosted (not on Cloudflare products, for the record) and it tracks things like:

2,055 services, 167 libraries, and 122 packages
228 APIs with schema definitions
544 systems (products) across 45 domains
1,302 databases, 277 ClickHouse tables, 173 clusters
375 teams and 6,389 users with ownership mappings
Dependency graphs connecting services to the databases, Kafka topics, and cloud resources they rely on

Our Backstage MCP server (13 tools) is available through our MCP Portal, and an agent can look up who owns a service, check what it depends on, find related API specs, and pull Tech Insights scores, all without leaving the coding session.

Without this structured data, agents are working blind. They can read the code in front of them, but they can't see the system around it. The catalog turns individual repos into a connected map of the engineering organization.

AGENTS.md: getting thousands of repos ready for AI

Early in the rollout, we kept seeing the same failure mode: coding agents produced changes that looked plausible and were still wrong for the repo. Usually the problem was local context: the model didn't know the right test command, the team's current conventions, or which parts of the codebase were off-limits. That pushed us toward AGENTS.md: a short, structured file in each repo that tells coding agents how the codebase actually works and forces teams to make that context explicit.

What AGENTS.md looks like

We built a system that generates AGENTS.md files across our GitLab instance. Because these files sit directly in the model's context window, we wanted them to stay short and high-signal. A typical file looks like this:

When an agent reads this file, it doesn't have to infer the repo from scratch. It knows how the codebase is organized, which conventions to follow and which Engineering Codex rules apply.

How we generate them at scale

The generator pipeline pulls entity metadata from our Backstage service catalog (ownership, dependencies, system relationships), analyzes the repository structure to detect the language, build system, test framework, and directory layout, then maps the detected stack to relevant Engineering Codex standards. A capable model then generates the structured document, and the system opens a merge request so the owning team can review and refine it.

We've processed roughly 3,900 repositories this way. The first pass wasn't always perfect, especially for polyglot repos or unusual build setups, but even that baseline was much better than asking agents to infer everything from scratch.

The initial merge request solved the bootstrap problem, but keeping these files current mattered just as much. A stale AGENTS.md can be worse than no file at all. We closed that loop with the AI Code Reviewer, which can flag when repository changes suggest that AGENTS.md should be updated.

Act 3: The enforcement layer

The AI Code Reviewer

Every merge request at Cloudflare gets an AI code review. Integration is straightforward: teams add a single CI component to their pipeline, and from that point every MR is reviewed automatically.

We use GitLab's self-hosted solution as our CI/CD platform. The reviewer is implemented as a GitLab CI component that teams include in their pipeline. When an MR is opened or updated, the CI job runs OpenCode with a multi-agent review coordinator. The coordinator classifies the MR by risk tier (trivial, lite, or full) and delegates to specialized review agents: code quality, security, codex compliance, documentation, performance, and release impact. Each agent connects to the AI Gateway for model access, pulls Engineering Codex rules from a central repo, and reads the repository's AGENTS.md for codebase context. Results are posted back as structured MR comments.

A separate Workers-based config service handles centralized model selection per reviewer agent, so we can shift models without changing the CI template. The review process itself runs in the CI runner and is stateless per execution.

The output format

We spent time getting the output format right. Reviews are broken into categories (Security, Code Quality, Performance) so engineers can scan headers rather than reading walls of text. Each finding has a severity level (Critical, Important, Suggestion, or Optional Nits) that makes it immediately clear what needs attention versus what's informational.

The reviewer maintains context across iterations. If it flagged something in a previous review round that has since been fixed, it acknowledges that rather than re-raising the same issue. And when a finding maps to an Engineering Codex rule, it cites the specific rule ID, turning an AI suggestion into a reference to an organizational standard.

Workers AI handles about 15% of the reviewer's traffic, primarily for documentation review tasks where Kimi K2.5 performs well at a fraction of the cost of frontier models. Models like Opus 4.6 and GPT 5.4 handle security-sensitive and architecturally complex reviews where reasoning capability matters most.

Over the last 30 days:

100% AI code reviewer coverage across all repos on our standard CI pipeline.
5.47M AI Gateway requests
24.77B tokens processed

We're releasing a detailed technical blog post alongside this one that covers the reviewer's internal architecture, including how we route between models, the multi-agent orchestration, and the cost optimization strategies we've developed.

Engineering Codex: engineering standards as agent skills

The Engineering Codex is Cloudflare's new internal standards system where our core engineering standards live. We have a multi-stage AI distillation process, which outputs a set of codex rules ("If you need X, use Y. You must do X, if you are doing Y or Z.") along with an agent skill that uses progressive disclosure and nested hierarchical information directories and links across markdown files.

This skill is available for engineers to use locally as they build with prompts like “how should I handle errors in my Rust service?” or “review this TypeScript code for compliance.” Our Network Firewall team audited rampartd using a multi-agent consensus process where every requirement was scored COMPLIANT, PARTIAL, or NON-COMPLIANT with specific violation details and remediation steps reducing what previously required weeks of manual work to a structured, repeatable process.

At review time, the AI Code Reviewer cites specific Codex rules in its feedback.

^{AI Code Review: showing categorized findings (Codex Compliance in this case) noting the codex RFC violation.}

None of these pieces are especially novel on their own. Plenty of companies run service catalogs, ship reviewer bots, or publish engineering standards. The difference is the wiring. When an agent can pull context from Backstage, read AGENTS.md for the repo it’s editing, and get reviewed against Codex rules by the same toolchain, the first draft is usually close enough to ship. That wasn’t true six months ago.

The scoreboard

From launching this effort to 93% R&D adoption took less than a year.

Company-wide adoption (Feb 5 – April 15, 2026):

AI Gateway (last 30 days, combined):

Workers AI (last 30 days):

What's next: background agents

The next evolution in our internal engineering stack will include background agents: agents that can be spun up on demand with the same tools available locally (MCP portal, git, test runners) but running entirely in the cloud. The architecture uses Durable Objects and the Agents SDK for orchestration, delegating to Sandbox containers when the job requires a full development environment like cloning a repo, installing dependencies, or running tests. The Sandbox SDK went GA during Agents Week.

Long-running agents, shipped natively into the Agents SDK during Agents Week, solve the durable session problem that previously required workarounds. The SDK now supports sessions that run for extended periods without eviction, enough for an agent to clone a large repo, run a full test suite, iterate on failures, and open a MR in a single session.

This represents an eleven-month effort to rethink not just how code gets written, but how it gets reviewed, how standards are enforced, and how changes ship safely across thousands of repos. Every layer runs on the same products our customers use.

Start building

Agents Week just shipped everything you need. The platform is here.

That agents starter gets you running. The diagram below is the full architecture for when you’re ready to grow it, your tools layer on top (chatbot, web UI, CLI, browser extension), the Agents SDK handling session state and orchestration in the middle, and the Cloudflare services you call from it underneath.

Docs: Agents SDK · Sandbox SDK · AI Gateway · Workers AI · Workflows · Code Mode · MCP on Cloudflare

Repos: cloudflare/agents · cloudflare/sandbox-sdk · cloudflare/mcp-server-cloudflare · cloudflare/skills

For more on how we’re using AI at Cloudflare, read the post on our process for AI Code Review. And check out everything we shipped during Agents Week.

We’d love to hear what you build. Find us on Discord, X, and Bluesky.

^{Ayush Thakur built the AGENTS.md system and the AI Gateway integration for the OpenCode infrastructure, Scott Roemeschke is the Engineering Manager of the Developer Productivity team at Cloudflare, Rajesh Bhatia leads the Productivity Platform function at Cloudflare. This post was a collaborative effort across the Devtools team, with help from volunteers across the company through the iMARS (Internal MCP Agent/Server Rollout Squad) tiger team.}

Building the agentic cloud: everything we launched during Agents Week 2026

Ming Lu — Mon, 20 Apr 2026 13:00:00 GMT

Today marks the end of our first Agents Week, an innovation week dedicated entirely to the age of agents. It couldn’t have been more timely: over the past year, agents have swiftly changed how people work. Coding agents are helping developers ship faster than ever. Support agents resolve tickets end-to-end. Research agents validate hypotheses across hundreds of sources in minutes. And people aren't just running one agent: they're running several in parallel and around the clock.

As Cloudflare's CTO Dane Knecht and VP of Product Rita Kozlov noted in our welcome to Agents Week post, the potential scale of agents is staggering: If even a fraction of the world's knowledge workers each run a few agents in parallel, you need compute capacity for tens of millions of simultaneous sessions. The one-app-serves-many-users model the cloud was built on doesn't work for that. But that's exactly what developers and businesses want to do: build agents, deploy them to users, and run them at scale.

Getting there means solving problems across the entire stack. Agents need compute that scales from full operating systems to lightweight isolates. They need security and identity built into how they run. They need an agent toolbox: the right models, tools, and context to do real work. All the code that agents generate needs a clear path from afternoon prototype to production app. And finally, as agents drive a growing share of Internet traffic, the web itself needs to adapt for the emerging agentic web. Turns out, the containerless, serverless compute platform we launched eight years ago with Workers was ready-made for this moment. Since then, we've grown it into a full platform, and this week we shipped the next wave of primitives purpose-built for agents, organized around exactly those problems.

We are here to create Cloud 2.0 — the agentic cloud. Infrastructure designed for a world where agents are a primary workload.

Here's a list of everything we announced this week — we wouldn’t want you to miss a thing.

Compute

It starts with compute. Agents need somewhere to run, and somewhere to store and run the code they write. Not all agents need the same thing: some need a full operating system to install packages and run terminal commands, most need something lightweight that starts in milliseconds and scales to millions. This week we shipped the environments to run them, as well as a new Git-compatible workspace for agents:

Security

Running agents and their code is only half the challenge. Agents connect to private networks, access internal services, and take autonomous actions on behalf of users. When anyone in an organization can spin up their own agents, security can't be an afterthought. It has to be the default. This week, we launched the tools to make that easy.

Agent Toolbox

A capable agent needs to be able to think and remember, communicate, and see. This means being powered with the right models, with access to the right tools and the right context for their task at hand. This week we shipped the primitives — inference, search, memory, voice, email, and a browser — that turn an agent into something that actually gets work done.

Prototype to production

The best infrastructure is also one that’s easy to use. We want to meet developers and their agents where they’re already working: in the terminal, in the editor, in a prompt, and make the full Cloudflare platform accessible without context-switching.

Agentic Web

As more agents come online, they're still browsing an Internet that was built for people. Existing websites need new tools to control what bots can access their content, package and present it for agents, and measure how ready they are for this shift.

That’s a wrap

Agents Week 2026 is ending, but the agentic cloud is just getting started. Everything we shipped this week — from compute and security to the agent toolbox and the agentic web — is the foundation. We're going to keep building on it to give you everything you need to build what's next.

We also have more blog posts coming out today and tomorrow to continue the story, so keep an eye out for the latest at our blog.

If you're building on any of what we announced this week, we want to hear about it. Come find us on X or Discord, or head to the developer documentation.

Scaling MCP adoption: Our reference architecture for simpler, safer and cheaper enterprise deployments of MCP

Sharon Goldberg — Tue, 14 Apr 2026 13:00:10 GMT

We at Cloudflare have aggressively adopted Model Context Protocol (MCP) as a core part of our AI strategy. This shift has moved well beyond our engineering organization, with employees across product, sales, marketing, and finance teams now using agentic workflows to drive efficiency in their daily tasks. But the adoption of agentic workflow with MCP is not without its security risks. These range from authorization sprawl, prompt injection, and supply chain risks. To secure this broad company-wide adoption, we have integrated a suite of security controls from both our Cloudflare One (SASE) platform and our Cloudflare Developer platform, allowing us to govern AI usage with MCP without slowing down our workforce.

In this blog we’ll walk through our own best practices for securing MCP workflows, by putting different parts of our platform together to create a unified security architecture for the era of autonomous AI. We’ll also share two new concepts that support enterprise MCP deployments:

We are launching Code Mode with MCP server portals, to drastically reduce token costs associated with MCP usage;
We describe how to use Cloudflare Gateway for Shadow MCP detection, to discover use of unauthorized remote MCP servers.

We also talk about how our organization approached deploying MCP, and how we built out our MCP security architecture using Cloudflare products including remote MCP servers, Cloudflare Access, MCP server portals and AI Gateway.

Remote MCP servers provide better visibility and control

MCP is an open standard that enables developers to build a two-way connection between AI applications and the data sources they need to access. In this architecture, the MCP client is the integration point with the LLM or other AI agent, and the MCP server sits between the MCP client and the corporate resources.

The separation between MCP clients and MCP servers allows agents to autonomously pursue goals and take actions while maintaining a clear boundary between the AI (integrated at the MCP client) and the credentials and APIs of the corporate resource (integrated at the MCP server).

Our workforce at Cloudflare is constantly using MCP servers to access information in various internal resources, including our project management platform, our internal wiki, documentation and code management platforms, and more.

Very early on, we realized that locally-hosted MCP servers were a security liability. Local MCP server deployments may rely on unvetted software sources and versions, which increases the risk of supply chain attacks or tool injection attacks. They prevent IT and security administrators from administrating these servers, leaving it up to individual employees and developers to choose which MCP servers they want to run and how they want to keep them up to date. This is a losing game.

Instead, we have a centralized team at Cloudflare that manages our MCP server deployment across the enterprise. This team built a shared MCP platform inside our monorepo that provides governed infrastructure out of the box. When an employee wants to expose an internal resource via MCP, they first get approval from our AI governance team, and then they copy a template, write their tool definitions, and deploy, all the while inheriting default-deny write controls with audit logging, auto-generated CI/CD pipelines, and secrets management for free. This means standing up a new governed MCP server is minutes of scaffolding. The governance is baked into the platform itself, which is what allowed adoption to spread so quickly.

Our CI/CD pipeline deploys them as remote MCP servers on custom domains on Cloudflare’s developer platform. This gives us visibility into which MCPs servers are being used by our employees, while maintaining control over software sources. As an added bonus, every remote MCP server on the Cloudflare developer platform is automatically deployed across our global network of data centers, so MCP servers can be accessed by our employees with low latency, regardless of where they might be in the world.

Cloudflare Access provides authentication

Some of our MCP servers sit in front of public resources, like our Cloudflare documentation MCP server or Cloudflare Radar MCP server, and thus we want them to be accessible to anyone. But many of the MCP servers used by our workforce are sitting in front of our private corporate resources. These MCP servers require user authentication to ensure that they are off limits to everyone but authorized Cloudflare employees. To achieve this, our monorepo template for MCP servers integrates Cloudflare Access as the OAuth provider. Cloudflare Access secures login flows and issues access tokens to resources, while acting as an identity aggregator that verifies end user single-sign on (SSO), multifactor authentication (MFA), and a variety of contextual attributes such as IP addresses, location, or device certificates.

MCP server portals centralize discovery and governance

^{MCP server portals unify governance and control for all AI activity.}

As the number of our remote MCP servers grew, we hit a new wall: discovery. We wanted to make it easy for every employee (especially those that are new to MCP) to find and work with all the MCP servers that are available to them. Our MCP server portals product provided a convenient solution. The employee simply connects their MCP client to the MCP server portal, and the portal immediately reveals every internal and third-party MCP servers they are authorized to use.

Beyond this, our MCP server portals provide centralized logging, consistent policy enforcement and data loss prevention (DLP guardrails). Our administrators can see who logged into what MCP portal and create DLP rules that prevent certain data, like personally identifiable data (PII), from being shared with certain MCP servers.

We can also create policies that control who has access to the portal itself, and what tools from each MCP server should be exposed. For example, we could set up one MCP server portal that is only accessible to employees that are part of our finance group that exposes just the read-only tools for the MCP server in front of our internal code repository. Meanwhile, a different MCP server portal, accessible only to employees on their corporate laptops that are in our engineering team, could expose more powerful read/write tools to our code repository MCP server.

An overview of our MCP server portal architecture is shown above. The portal supports both remote MCP servers hosted on Cloudflare, and third-party MCP servers hosted anywhere else. What makes this architecture uniquely performant is that all these security and networking components run on the same physical machine within our global network. When an employee's request moves through the MCP server portal, a Cloudflare-hosted remote MCP server, and Cloudflare Access, their traffic never needs to leave the same physical machine.

Code Mode with MCP server portals reduces costs

After months of high-volume MCP deployments, we’ve paid out our fair share of tokens. We’ve also started to think most people are doing MCP wrong.

The standard approach to MCP requires defining a separate tool for every API operation that is exposed via an MCP server. But this static and exhaustive approach quickly exhausts an agent’s context window, especially for large platforms with thousands of endpoints.

We previously wrote about how we used server-side Code Mode to power Cloudflare’s MCP server, allowing us to expose the thousands of end-points in Cloudflare API while reducing token use by 99.9%. The Cloudflare MCP server exposes just two tools: a search tool lets the model write JavaScript to explore what’s available, and an execute tool lets it write JavaScript to call the tools it finds. The model discovers what it needs on demand, rather than receiving everything upfront.

We like this pattern so much, we had to make it available for everyone. So we have now launched the ability to use the “Code Mode” pattern with MCP server portals. Now you can front all of your MCP servers with a centralized portal that performs audit controls and progressive tool disclosure, in order to reduce token costs.

Here is how it works. Instead of exposing every tool definition to a client, all of your underlying MCP servers collapse into just two MCP portal tools: portal_codemode_search and portal_codemode_execute. The search tool gives the model access to a codemode.tools() function that returns all the tool definitions from every connected upstream MCP server. The model then writes JavaScript to filter and explore these definitions, finding exactly the tools it needs without every schema being loaded into context. The execute tool provides a codemode proxy object where each upstream tool is available as a callable function. The model writes JavaScript that calls these tools directly, chaining multiple operations, filtering results, and handling errors in code. All of this runs in a sandboxed environment on the MCP server portal powered by Dynamic Workers.

Here is an example of an agent that needs to find a Jira ticket and update it with information from Google Drive. It first searches for the right tools:

The model now knows the exact tool names and parameters it needs, without the full schemas of tools ever entering its context. It then writes a single execute call to chain the operations together:

This is just two tool calls. The first discovers what's available, the second does the work. Without Code Mode, this same workflow would have required the model to receive the full schemas of every tool from both MCP servers upfront, and then make three separate tool invocations.

Let’s put the savings in perspective: when our internal MCP server portal is connected to just four of our internal MCP servers, it exposes 52 tools that consume approximately 9,400 tokens of context just for their definitions. With Code Mode enabled, those 52 tools collapse into 2 portal tools consuming roughly 600 tokens, a 94% reduction. And critically, this cost stays fixed. As we connect more MCP servers to the portal, the token cost of Code Mode doesn’t grow.

Code Mode can be activated on an MCP server portal by adding a query parameter to the URL. Instead of connecting to your portal over its usual URL (e.g. https://myportal.example.com/mcp), you attach ?codemode=search_and_execute to the URL (e.g. https://myportal.example.com/mcp?codemode=search_and_execute).

AI Gateway provides extensibility and cost controls

We aren’t done yet. We plug AI Gateway into our architecture by positioning it on the connection between the MCP client and the LLM. This allows us to quickly switch between various LLM providers (to prevent vendor lock-in) and to enforce cost controls (by limiting the number of tokens each employee can burn through). The full architecture is shown below.

Cloudflare Gateway discovers and blocks shadow MCP

Now that we’ve provided governed access to authorized MCP servers, let’s look into dealing with unauthorized MCP servers. We can perform shadow MCP discovery using Cloudflare Gateway. Cloudflare Gateway is our comprehensive secure web gateway that provides enterprise security teams with visibility and control over their employees’ Internet traffic.

We can use the Cloudflare Gateway API to perform a multi-layer scan to find remote MCP servers that are not being accessed via an MCP server portal. This is possible using a variety of existing Gateway and Data Loss Prevention (DLP) selectors, including:

Using the Gateway httpHost selector to scan for
- known MCP server hostnames using (like mcp.stripe.com)
- mcp.* subdomains using wildcard hostname patterns
Using the Gateway httpRequestURI selector to scan for MCP-specific URL paths like /mcp and /mcp/sse
Using DLP-based body inspection to find MCP traffic, even if that traffic uses URI that do not contain the telltale mentions of mcp or sse. Specifically, we use the fact that MCP uses JSON-RPC over HTTP, which means every request contains a "method" field with values like "tools/call", "prompts/get", or "initialize." Here are some regex rules that can be used to detect MCP traffic in the HTTP body:

The Gateway API supports additional automation. For example, one can use the custom DLP profile we defined above to block traffic, or redirect it, or just to log and inspect MCP payloads. Put this together, and Gateway can be used to provide comprehensive detection of unauthorized remote MCP servers accessed via an enterprise network.

For more information on how to build this out, see this tutorial.

Public-facing MCP Servers are protected with AI Security for Apps

So far, we’ve been focused on protecting our workforce’s access to our internal MCP servers. But, like many other organizations, we also have public-facing MCP servers that our customers can use to agentically administer and operate Cloudflare products. These MCP servers are hosted on Cloudflare’s developer platform. (You can find a list of individual MCPs for specific products here, or refer back to our new approach for providing more efficient access to the entire Cloudflare API using Code Mode.)

We believe that every organization should publish official, first-party MCP servers for their products. The alternative is that your customers source unvetted servers from public repositories where packages may contain dangerous trust assumptions, undisclosed data collection, and any range of unsanctioned behaviors. By publishing your own MCP servers, you control the code, update cadence, and security posture of the tools your customers use.

Since every remote MCP server is an HTTP endpoint, we can put it behind the Cloudflare Web Application Firewall (WAF). Customers can enable the AI Security for Apps feature within the WAF to automatically inspect inbound MCP traffic for prompt injection attempts, sensitive data leakage, and topic classification. Public facing MCPs are protected just as any other web API.

The future of MCP in the enterprise

We hope our experience, products, and reference architectures will be useful to other organizations as they continue along their own journey towards broad enterprise-wide adoption of MCP.

We’ve secured our own MCP workflows by:

Offering our developers a templated framework for building and deploying remote MCP servers on our developer platform using Cloudflare Access for authentication
Ensuring secure, identity-based access to authorized MCP servers by connecting our entire workforce to MCP server portals
Controlling costs using AI Gateway to mediate access to the LLMs powering our workforce’s MCP clients, and using Code Mode in MCP server portals to reduce token consumption and context bloat
Discovering shadow MCP usage by Cloudflare Gateway

For organizations advancing on their own enterprise MCP journeys, we recommend starting by putting your existing remote and third-party MCP servers behind Cloudflare MCP server portals and enabling Code Mode to start benefitting for cheaper, safer and simpler enterprise deployments of MCP.

_{Acknowledgements: This reference architecture and blog represents this work of many people across many different roles and business units at Cloudflare. This is just a partial list of contributors: Ann Ming Samborski, Kate Reznykova, Mike Nomitch, James Royal, Liam Reese, Yumna Moazzam, Simon Thorpe, Rian van der Merwe, Rajesh Bhatia, Ayush Thakur, Gonzalo Chavarri, Maddy Onyehara, and Haley Campbell.}

Sandboxing AI agents, 100x faster

Kenton Varda — Tue, 24 Mar 2026 13:00:00 GMT

Last September we introduced Code Mode, the idea that agents should perform tasks not by making tool calls, but instead by writing code that calls APIs. We've shown that simply converting an MCP server into a TypeScript API can cut token usage by 81%. We demonstrated that Code Mode can also operate behind an MCP server instead of in front of it, creating the new Cloudflare MCP server that exposes the entire Cloudflare API with just two tools and under 1,000 tokens.

But if an agent (or an MCP server) is going to execute code generated on-the-fly by AI to perform tasks, that code needs to run somewhere, and that somewhere needs to be secure. You can't just eval() AI-generated code directly in your app: a malicious user could trivially prompt the AI to inject vulnerabilities.

You need a sandbox: a place to execute code that is isolated from your application and from the rest of the world, except for the specific capabilities the code is meant to access.

Sandboxing is a hot topic in the AI industry. For this task, most people are reaching for containers. Using a Linux-based container, you can start up any sort of code execution environment you want. Cloudflare even offers our container runtime and our Sandbox SDK for this purpose.

But containers are expensive and slow to start, taking hundreds of milliseconds to boot and hundreds of megabytes of memory to run. You probably need to keep them warm to avoid delays, and you may be tempted to reuse existing containers for multiple tasks, compromising the security.

If we want to support consumer-scale agents, where every end user has an agent (or many!) and every agent writes code, containers are not enough. We need something lighter.

And we have it.

Dynamic Worker Loader: a lean sandbox

Tucked into our Code Mode post in September was the announcement of a new, experimental feature: the Dynamic Worker Loader API. This API allows a Cloudflare Worker to instantiate a new Worker, in its own sandbox, with code specified at runtime, all on the fly.

Dynamic Worker Loader is now in open beta, available to all paid Workers users.

Read the docs for full details, but here's what it looks like:

That's it.

100x faster

Dynamic Workers use the same underlying sandboxing mechanism that the entire Cloudflare Workers platform has been built on since its launch, eight years ago: isolates. An isolate is an instance of the V8 JavaScript execution engine, the same engine used by Google Chrome. They are how Workers work.

An isolate takes a few milliseconds to start and uses a few megabytes of memory. That's around 100x faster and 10x-100x more memory efficient than a typical container.

That means that if you want to start a new isolate for every user request, on-demand, to run one snippet of code, then throw it away, you can.

Unlimited scalability

Many container-based sandbox providers impose limits on global concurrent sandboxes and rate of sandbox creation. Dynamic Worker Loader has no such limits. It doesn't need to, because it is simply an API to the same technology that has powered our platform all along, which has always allowed Workers to seamlessly scale to millions of requests per second.

Want to handle a million requests per second, where every single request loads a separate Dynamic Worker sandbox, all running concurrently? No problem!

Zero latency

One-off Dynamic Workers usually run on the same machine — the same thread, even — as the Worker that created them. No need to communicate around the world to find a warm sandbox. Isolates are so lightweight that we can just run them wherever the request landed. Dynamic Workers are supported in every one of Cloudflare's hundreds of locations around the world.

It's all JavaScript

The only catch, vs. containers, is that your agent needs to write JavaScript.

Technically, Workers (including dynamic ones) can use Python and WebAssembly, but for small snippets of code — like that written on-demand by an agent — JavaScript will load and run much faster.

We humans tend to have strong preferences on programming languages, and while many love JavaScript, others might prefer Python, Rust, or countless others.

But we aren't talking about humans here. We're talking about AI. AI will write any language you want it to. LLMs are experts in every major language. Their training data in JavaScript is immense.

JavaScript, by its nature on the web, is designed to be sandboxed. It is the correct language for the job.

Tools defined in TypeScript

If we want our agent to be able to do anything useful, it needs to talk to external APIs. How do we tell it about the APIs it has access to?

MCP defines schemas for flat tool calls, but not programming APIs. OpenAPI offers a way to express REST APIs, but it is verbose, both in the schema itself and the code you'd have to write to call it.

For APIs exposed to JavaScript, there is a single, obvious answer: TypeScript.

Agents know TypeScript. TypeScript is designed to be concise. With very few tokens, you can give your agent a precise understanding of your API.

Compare this with the equivalent OpenAPI spec (which is so long you have to scroll to see it all):

We think the TypeScript API is better. It's fewer tokens and much easier to understand (for both agents and humans).

Dynamic Worker Loader makes it easy to implement a TypeScript API like this in your own Worker and then pass it in to the Dynamic Worker either as a method parameter or in the env object. The Workers Runtime will automatically set up a Cap'n Web RPC bridge between the sandbox and your harness code, so that the agent can invoke your API across the security boundary without ever realizing that it isn't using a local library.

That means your agent can write code like this:

HTTP filtering and credential injection

If you prefer to give your agents HTTP APIs, that's fully supported. Using the globalOutbound option to the worker loader API, you can register a callback to be invoked on every HTTP request, in which you can inspect the request, rewrite it, inject auth keys, respond to it directly, block it, or anything else you might like.

For example, you can use this to implement credential injection (token injection): When the agent makes an HTTP request to a service that requires authorization, you add credentials to the request on the way out. This way, the agent itself never knows the secret credentials, and therefore cannot leak them.

Using a plain HTTP interface may be desirable when an agent is talking to a well-known API that is in its training set, or when you want your agent to use a library that is built on a REST API (the library can run inside the agent's sandbox).

With that said, in the absence of a compatibility requirement, TypeScript RPC interfaces are better than HTTP:

As shown above, a TypeScript interface requires far fewer tokens to describe than an HTTP interface.
The agent can write code to call TypeScript interfaces using far fewer tokens than equivalent HTTP.
With TypeScript interfaces, since you are defining your own wrapper interface anyway, it is easier to narrow the interface to expose exactly the capabilities that you want to provide to your agent, both for simplicity and security. With HTTP, you are more likely implementing filtering of requests made against some existing API. This is hard, because your proxy must fully interpret the meaning of every API call in order to properly decide whether to allow it, and HTTP requests are complicated, with many headers and other parameters that could all be meaningful. It ends up being easier to just write a TypeScript wrapper that only implements the functions you want to allow.

Battle-hardened security

Hardening an isolate-based sandbox is tricky, as it is a more complicated attack surface than hardware virtual machines. Although all sandboxing mechanisms have bugs, security bugs in V8 are more common than security bugs in typical hypervisors. When using isolates to sandbox possibly-malicious code, it's important to have additional layers of defense-in-depth. Google Chrome, for example, implemented strict process isolation for this reason, but it is not the only possible solution.

We have nearly a decade of experience securing our isolate-based platform. Our systems automatically deploy V8 security patches to production within hours — faster than Chrome itself. Our security architecture features a custom second-layer sandbox with dynamic cordoning of tenants based on risk assessments. We've extended the V8 sandbox itself to leverage hardware features like MPK. We've teamed up with (and hired) leading researchers to develop novel defenses against Spectre. We also have systems that scan code for malicious patterns and automatically block them or apply additional layers of sandboxing. And much more.

When you use Dynamic Workers on Cloudflare, you get all of this automatically.

Helper libraries

We've built a number of libraries that you might find useful when working with Dynamic Workers:

Code Mode

@cloudflare/codemode simplifies running model-generated code against AI tools using Dynamic Workers. At its core is DynamicWorkerExecutor(), which constructs a purpose-built sandbox with code normalisation to handle common formatting errors, and direct access to a globalOutbound fetcher for controlling fetch() behaviour inside the sandbox — set it to null for full isolation, or pass a Fetcher binding to route, intercept or enrich outbound requests from the sandbox.

The Code Mode SDK also provides two server-side utility functions. codeMcpServer({ server, executor }) wraps an existing MCP Server, replacing its tool surface with a single code() tool. openApiMcpServer({ spec, executor, request }) goes further: given an OpenAPI spec and an executor, it builds a complete MCP Server with search() and execute() tools as used by the Cloudflare MCP Server, and better suited to larger APIs.

In both cases, the code generated by the model runs inside Dynamic Workers, with calls to external services made over RPC bindings passed to the executor.

Learn more about the library and how to use it.

Bundling

Dynamic Workers expect pre-bundled modules. @cloudflare/worker-bundler handles that for you: give it source files and a package.json, and it resolves npm dependencies from the registry, bundles everything with esbuild, and returns the module map the Worker Loader expects.

It also supports full-stack apps via createApp — bundle a server Worker, client-side JavaScript, and static assets together, with built-in asset serving that handles content types, ETags, and SPA routing.

Learn more about the library and how to use it.

File manipulation

@cloudflare/shell gives your agent a virtual filesystem inside a Dynamic Worker. Agent code calls typed methods on a state object — read, write, search, replace, diff, glob, JSON query/update, archive — with structured inputs and outputs instead of string parsing.

Storage is backed by a durable Workspace (SQLite + R2), so files persist across executions. Coarse operations like searchFiles, replaceInFiles, and planEdits minimize RPC round-trips — the agent issues one call instead of looping over individual files. Batch writes are transactional by default: if any write fails, earlier writes roll back automatically.

The package also ships prebuilt TypeScript type declarations and a system prompt template, so you can drop the full state API into your LLM context in a handful of tokens.

Learn more about the library and how to use it.

How are people using it?

Code Mode

Developers want their agents to write and execute code against tool APIs, rather than making sequential tool calls one at a time. With Dynamic Workers, the LLM generates a single TypeScript function that chains multiple API calls together, runs it in a Dynamic Worker, and returns the final result back to the agent. As a result, only the output, and not every intermediate step, ends up in the context window. This cuts both latency and token usage, and produces better results, especially when the tool surface is large.

Our own Cloudflare MCP server is built exactly this way: it exposes the entire Cloudflare API through just two tools — search and execute — in under 1,000 tokens, because the agent writes code against a typed API instead of navigating hundreds of individual tool definitions.

Building custom automations

Developers are using Dynamic Workers to let agents build custom automations on the fly. Zite, for example, is building an app platform where users interact through a chat interface — the LLM writes TypeScript behind the scenes to build CRUD apps, connect to services like Stripe, Airtable, and Google Calendar, and run backend logic, all without the user ever seeing a line of code. Every automation runs in its own Dynamic Worker, with access to only the specific services and libraries that the endpoint needs.

“To enable server-side code for Zite’s LLM-generated apps, we needed an execution layer that was instant, isolated, and secure. Cloudflare’s Dynamic Workers hit the mark on all three, and out-performed all of the other platforms we benchmarked for speed and library support. The NodeJS compatible runtime supported all of Zite’s workflows, allowing hundreds of third party integrations, without sacrificing on startup time. Zite now services millions of execution requests daily thanks to Dynamic Workers.”

— Antony Toron, CTO and Co-Founder, Zite

Running AI-generated applications

Developers are building platforms that generate full applications from AI — either for their customers or for internal teams building prototypes. With Dynamic Workers, each app can be spun up on demand, then put back into cold storage until it's invoked again. Fast startup times make it easy to preview changes during active development. Platforms can also block or intercept any network requests the generated code makes, keeping AI-generated apps safe to run.

Pricing

Dynamically-loaded Workers are priced at $0.002 per unique Worker loaded per day (as of this post’s publication), in addition to the usual CPU time and invocation pricing of regular Workers.

For AI-generated "code mode" use cases, where every Worker is a unique one-off, this means the price is $0.002 per Worker loaded (plus CPU and invocations). This cost is typically negligible compared to the inference costs to generate the code.

During the beta period, the $0.002 charge is waived. As pricing is subject to change, please always check our Dynamic Workers pricing for the most current information.

Get Started

If you’re on the Workers Paid plan, you can start using Dynamic Workers today.

Dynamic Workers Starter

Use this “hello world” starter to get a Worker deployed that can load and execute Dynamic Workers.

Dynamic Workers Playground

You can also deploy the Dynamic Workers Playground, where you’ll be able to write or import code, bundle it at runtime with @cloudflare/worker-bundler, execute it through a Dynamic Worker, see real-time responses and execution logs.

Dynamic Workers are fast, scalable, and lightweight. Find us on Discord if you have any questions. We’d love to see what you build!

Code Mode: the better way to use MCP

Kenton Varda — Fri, 26 Sep 2025 13:00:00 GMT

It turns out we've all been using MCP wrong.

Most agents today use MCP by directly exposing the "tools" to the LLM.

We tried something different: Convert the MCP tools into a TypeScript API, and then ask an LLM to write code that calls that API.

The results are striking:

We found agents are able to handle many more tools, and more complex tools, when those tools are presented as a TypeScript API rather than directly. Perhaps this is because LLMs have an enormous amount of real-world TypeScript in their training set, but only a small set of contrived examples of tool calls.
The approach really shines when an agent needs to string together multiple calls. With the traditional approach, the output of each tool call must feed into the LLM's neural network, just to be copied over to the inputs of the next call, wasting time, energy, and tokens. When the LLM can write code, it can skip all that, and only read back the final results it needs.

In short, LLMs are better at writing code to call MCP, than at calling MCP directly.

What's MCP?

For those that aren't familiar: Model Context Protocol is a standard protocol for giving AI agents access to external tools, so that they can directly perform work, rather than just chat with you.

Seen another way, MCP is a uniform way to:

expose an API for doing something,
along with documentation needed for an LLM to understand it,
with authorization handled out-of-band.

MCP has been making waves throughout 2025 as it has suddenly greatly expanded the capabilities of AI agents.

The "API" exposed by an MCP server is expressed as a set of "tools". Each tool is essentially a remote procedure call (RPC) function – it is called with some parameters and returns a response. Most modern LLMs have the capability to use "tools" (sometimes called "function calling"), meaning they are trained to output text in a certain format when they want to invoke a tool. The program invoking the LLM sees this format and invokes the tool as specified, then feeds the results back into the LLM as input.

Anatomy of a tool call

Under the hood, an LLM generates a stream of "tokens" representing its output. A token might represent a word, a syllable, some sort of punctuation, or some other component of text.

A tool call, though, involves a token that does not have any textual equivalent. The LLM is trained (or, more often, fine-tuned) to understand a special token that it can output that means "the following should be interpreted as a tool call," and another special token that means "this is the end of the tool call." Between these two tokens, the LLM will typically write tokens corresponding to some sort of JSON message that describes the call.

For instance, imagine you have connected an agent to an MCP server that provides weather info, and you then ask the agent what the weather is like in Austin, TX. Under the hood, the LLM might generate output like the following. Note that here we've used words in <| and |> to represent our special tokens, but in fact, these tokens do not represent text at all; this is just for illustration.

I will use the Weather MCP server to find out the weather in Austin, TX.

Upon seeing these special tokens in the output, the LLM's harness will interpret the sequence as a tool call. After seeing the end token, the harness pauses execution of the LLM. It parses the JSON message and returns it as a separate component of the structured API result. The agent calling the LLM API sees the tool call, invokes the relevant MCP server, and then sends the results back to the LLM API. The LLM's harness will then use another set of special tokens to feed the result back into the LLM:

The LLM reads these tokens in exactly the same way it would read input from the user – except that the user cannot produce these special tokens, so the LLM knows it is the result of the tool call. The LLM then continues generating output like normal.

Different LLMs may use different formats for tool calling, but this is the basic idea.

What's wrong with this?

The special tokens used in tool calls are things LLMs have never seen in the wild. They must be specially trained to use tools, based on synthetic training data. They aren't always that good at it. If you present an LLM with too many tools, or overly complex tools, it may struggle to choose the right one or to use it correctly. As a result, MCP server designers are encouraged to present greatly simplified APIs as compared to the more traditional API they might expose to developers.

Meanwhile, LLMs are getting really good at writing code. In fact, LLMs asked to write code against the full, complex APIs normally exposed to developers don't seem to have too much trouble with it. Why, then, do MCP interfaces have to "dumb it down"? Writing code and calling tools are almost the same thing, but it seems like LLMs can do one much better than the other?

The answer is simple: LLMs have seen a lot of code. They have not seen a lot of "tool calls". In fact, the tool calls they have seen are probably limited to a contrived training set constructed by the LLM's own developers, in order to try to train it. Whereas they have seen real-world code from millions of open source projects.

Making an LLM perform tasks with tool calling is like putting Shakespeare through a month-long class in Mandarin and then asking him to write a play in it. It's just not going to be his best work.

But MCP is still useful, because it is uniform

MCP is designed for tool-calling, but it doesn't actually have to be used that way.

The "tools" that an MCP server exposes are really just an RPC interface with attached documentation. We don't really have to present them as tools. We can take the tools, and turn them into a programming language API instead.

But why would we do that, when the programming language APIs already exist independently? Almost every MCP server is just a wrapper around an existing traditional API – why not expose those APIs?

Well, it turns out MCP does something else that's really useful: It provides a uniform way to connect to and learn about an API.

An AI agent can use an MCP server even if the agent's developers never heard of the particular MCP server, and the MCP server's developers never heard of the particular agent. This has rarely been true of traditional APIs in the past. Usually, the client developer always knows exactly what API they are coding for. As a result, every API is able to do things like basic connectivity, authorization, and documentation a little bit differently.

This uniformity is useful even when the AI agent is writing code. We'd like the AI agent to run in a sandbox such that it can only access the tools we give it. MCP makes it possible for the agentic framework to implement this, by handling connectivity and authorization in a standard way, independent of the AI code. We also don't want the AI to have to search the Internet for documentation; MCP provides it directly in the protocol.

OK, how does it work?

We have already extended the Cloudflare Agents SDK to support this new model!

For example, say you have an app built with ai-sdk that looks like this:

You can wrap the tools and prompt with the codemode helper, and use them in your app:

With this change, your app will now start generating and running code that itself will make calls to the tools you defined, MCP servers included. We will introduce variants for other libraries in the very near future. Read the docs for more details and examples.

Converting MCP to TypeScript

When you connect to an MCP server in "code mode", the Agents SDK will fetch the MCP server's schema, and then convert it into a TypeScript API, complete with doc comments based on the schema.

For example, connecting to the MCP server at https://gitmcp.io/cloudflare/agents, will generate a TypeScript definition like this:

This TypeScript is then loaded into the agent's context. Currently, the entire API is loaded, but future improvements could allow an agent to search and browse the API more dynamically – much like an agentic coding assistant would.

Running code in a sandbox

Instead of being presented with all the tools of all the connected MCP servers, our agent is presented with just one tool, which simply executes some TypeScript code.

The code is then executed in a secure sandbox. The sandbox is totally isolated from the Internet. Its only access to the outside world is through the TypeScript APIs representing its connected MCP servers.

These APIs are backed by RPC invocation which calls back to the agent loop. There, the Agents SDK dispatches the call to the appropriate MCP server.

The sandboxed code returns results to the agent in the obvious way: by invoking console.log(). When the script finishes, all the output logs are passed back to the agent.

Dynamic Worker loading: no containers here

This new approach requires access to a secure sandbox where arbitrary code can run. So where do we find one? Do we have to run containers? Is that expensive?

No. There are no containers. We have something much better: isolates.

The Cloudflare Workers platform has always been based on V8 isolates, that is, isolated JavaScript runtimes powered by the V8 JavaScript engine.

Isolates are far more lightweight than containers. An isolate can start in a handful of milliseconds using only a few megabytes of memory.

Isolates are so fast that we can just create a new one for every piece of code the agent runs. There's no need to reuse them. There's no need to prewarm them. Just create it, on demand, run the code, and throw it away. It all happens so fast that the overhead is negligible; it's almost as if you were just eval()ing the code directly. But with security.

The Worker Loader API

Until now, though, there was no way for a Worker to directly load an isolate containing arbitrary code. All Worker code instead had to be uploaded via the Cloudflare API, which would then deploy it globally, so that it could run anywhere. That's not what we want for Agents! We want the code to just run right where the agent is.

To that end, we've added a new API to the Workers platform: the Worker Loader API. With it, you can load Worker code on-demand. Here's what it looks like:

You can start playing with this API right now when running workerd locally with Wrangler (check out the docs), and you can sign up for beta access to use it in production.

Workers are better sandboxes

The design of Workers makes it unusually good at sandboxing, especially for this use case, for a few reasons:

Faster, cheaper, disposable sandboxes

The Workers platform uses isolates instead of containers. Isolates are much lighter-weight and faster to start up. It takes mere milliseconds to start a fresh isolate, and it's so cheap we can just create a new one for every single code snippet the agent generates. There's no need to worry about pooling isolates for reuse, prewarming, etc.

We have not yet finalized pricing for the Worker Loader API, but because it is based on isolates, we will be able to offer it at a significantly lower cost than container-based solutions.

Isolated by default, but connected with bindings

Workers are just better at handling isolation.

In Code Mode, we prohibit the sandboxed worker from talking to the Internet. The global fetch() and connect() functions throw errors.

But on most platforms, this would be a problem. On most platforms, the way you get access to private resources is, you start with general network access. Then, using that network access, you send requests to specific services, passing them some sort of API key to authorize private access.

But Workers has always had a better answer. In Workers, the "environment" (env object) doesn't just contain strings, it contains live objects, also known as "bindings". These objects can provide direct access to private resources without involving generic network requests.

In Code Mode, we give the sandbox access to bindings representing the MCP servers it is connected to. Thus, the agent can specifically access those MCP servers without having network access in general.

Limiting access via bindings is much cleaner than doing it via, say, network-level filtering or HTTP proxies. Filtering is hard on both the LLM and the supervisor, because the boundaries are often unclear: the supervisor may have a hard time identifying exactly what traffic is legitimately necessary to talk to an API. Meanwhile, the LLM may have difficulty guessing what kinds of requests will be blocked. With the bindings approach, it's well-defined: the binding provides a JavaScript interface, and that interface is allowed to be used. It's just better this way.

No API keys to leak

An additional benefit of bindings is that they hide API keys. The binding itself provides an already-authorized client interface to the MCP server. All calls made on it go to the agent supervisor first, which holds the access tokens and adds them into requests sent on to MCP.

This means that the AI cannot possibly write code that leaks any keys, solving a common security problem seen in AI-authored code today.

Try it now!

Sign up for the production beta

The Dynamic Worker Loader API is in closed beta. To use it in production, sign up today.

Or try it locally

If you just want to play around, though, Dynamic Worker Loading is fully available today when developing locally with Wrangler and workerd – check out the docs for Dynamic Worker Loading and code mode in the Agents SDK to get started.

An AI Index for all our customers

Celso Martinho — Fri, 26 Sep 2025 14:00:00 GMT

Today, we’re announcing the private beta of AI Index for domains on Cloudflare, a new type of web index that gives content creators the tools to make their data discoverable by AI, and gives AI builders access to better data for fair compensation.

With AI Index enabled on your domain, we will automatically create an AI-optimized search index for your website, and expose a set of ready-to-use standard APIs and tools including an MCP server, LLMs.txt, and a search API. Our customers will own and control that index and how it’s used, and you will have the ability to monetize access through Pay per crawl and the new x402 integrations. You will be able to use it to build modern search experiences on your own site, and more importantly, interact with external AI and Agentic providers to make your content more discoverable while being fairly compensated.

For AI builders—whether developers creating agentic applications, or AI platform companies providing foundational LLM models—Cloudflare will offer a new way to discover and retrieve web content: direct pub/sub connections to individual websites with AI Index. Instead of indiscriminate crawling, builders will be able to subscribe to specific sites that have opted in for discovery, receive structured updates as soon as content changes, and pay fairly for each access. Access is always at the discretion of the site owner.

From the individual indexes, Cloudflare will also build an aggregated layer, the Open Index, that bundles together participating sites. Builders get a single place to search across collections or the broader web, while every site still retains control and can earn from participation.

Why build an AI Index?

AI platforms are quickly becoming one of the main ways people discover information online. Whether asking a chatbot to summarize a news article or find a product recommendation, the path to that answer almost always starts with crawling original content and indexing or using that data for training. However, today, that process is largely controlled by platforms: what gets crawled, how often, and whether the site owner has any input in the matter.

Although Cloudflare now offers to monitor and control how AI services respect your access policies and how they access your content, it's still challenging to make new content visible. Content creators have no efficient way to signal to AI builders when a page is published or updated. On the other hand, for AI builders, crawling and recrawling unstructured content is costly, wastes resources, especially when you don’t know the quality and cost in advance.

We need a fairer and healthier ecosystem for content discovery and usage that bridges the gap between content creators and AI builders.

How AI Index will work

When you onboard a domain to Cloudflare, or if you have an existing domain on Cloudflare, you will have the choice to enable an AI Index. If enabled, we will automatically create an AI-optimized search index for your domain that you own and control.

As your site updates and grows, the index will evolve with it. New or updated pages will be processed in real-time using the same technology that powers Cloudflare AI Search (formerly AutoRAG) and its Website as a data source. Best of all, we will manage everything; you won't have to worry about each individual component of compute, storage resources, databases, embeddings, chunking, or AI models. Everything will happen behind the scenes, automatically.

Importantly, you will have control over what content to include or exclude from your website's index, and who can get access to your content via AI Crawl Control, ensuring that only the data you want to expose is made searchable and accessible. You also will be able to opt out of the AI Index completely; it will all be up to you.

When your AI Index is set up, you will get a set of ready-to-use APIs:

An MCP Server: Agentic applications will be able to connect directly to your site using the Model Context Protocol (MCP), making your content discoverable to agents in a standardized way. This includes support for NLWeb tools, an open project developed by Microsoft that defines a standard protocol for natural language queries on websites.
A flexible search API: This endpoint will return relevant results in structured JSON.
LLMs.txt and LLMs-full.txt: Standard files that provide LLMs with a machine-readable map of your site, following emerging open standards. These will help models understand how to use your site’s content at inference time. An example of llms.txt exists in the Cloudflare Developer Documentation.
A bulk data API: An endpoint for transferring large amounts of content efficiently, available under the rules you set. Instead of querying for every document, AI providers will be able to ingest in one shot.
Pub-sub subscriptions: AI platforms will be able to subscribe to your site’s index and receive events and content updates directly from Cloudflare in a structured format in real-time, making it easy for them to stay current without re-crawling.
Discoverability directives: In robots.txt and well-known URIs to allow AI agents and crawlers visiting your site to discover and use the available API automatically.

The index will integrate directly with AI Crawl Control, so you will be able to see who’s accessing your content, set rules, and manage permissions. And with Pay per crawl and x402 integrations, you can choose to directly monetize access to your content.

A feed of the web for AI builders

As an AI builder, you will be able to discover and subscribe to high-quality, permissioned web data through individual site’s AI indexes. Instead of sending crawlers blindly across the open Internet, you will connect via a pub/sub model: participating websites will expose structured updates whenever their content changes, and you will be able to subscribe to receive those updates in real-time. With this model, your new workflow may look something like this:

Discover websites that have opted in: Browse and filter through a directory of websites that make their indexes available through Cloudflare.
Evaluate content with metadata and metrics: Get content metadata information on various metrics (e.g., uniqueness, depth, contextual relevance, popularity) before accessing it.
Pay fairly for access: When content is valuable, platforms can compensate creators directly through Pay per crawl. These payments not only enable access but also support the continued creation of original content, helping to sustain a healthier ecosystem for discovery.
Subscribe to updates: Use pub-sub subscriptions to receive events about changes made by the website, so you know when to retrieve or crawl for new content without wasting resources on constant re-crawling.

By shifting from blind crawling to a permissioned pub/sub system for the web, AI builders save time, cut costs, and gain access to cleaner, high-quality data while content creators remain in control and are fairly compensated.

The aggregated Open Index

Individual indexes provide AI platforms with the ability to access data directly from specific sites, allowing them to subscribe for updates, evaluate value, and pay for full content access on a per-site basis. But when builders need to work at a larger scale, managing dozens or hundreds of separate subscriptions can become complex. The Open Index will provide an additional option: a bundled, opt-in collection of those indexes, featuring sophisticated features such as quality, uniqueness, originality, and depth of content filters, all accessible in one place.

The Open Index is designed to make content discovery at scale easier:

Get unified access: Query and retrieve data across many participating sites simultaneously. This reduces integration overhead and enables builders to plug into a curated collection of data, or use it as a ready-made web search layer that can be accessed at query time.
Discover broader scopes: Work with topic-specific bundles (e.g., news, documentation, scientific research) or a general discovery index covering the broader web. This makes it simple to explore new content sources you may not have identified individually.
Bottom-up monetization: Results still originate from an individual site’s AI index, with monetization flowing back to that site through Pay per crawl, helping preserve fairness and sustainability at scale.

Together, per-site AI indexes and the Open Index will provide flexibility and precise control when you want full content from individual sites (i.e., for training, AI agents, or search experiences), and broad search coverage when you need a unified search across the web.

How you can participate in the shift

With AI Index and the Cloudflare Open Index, we’re creating a model where websites decide how their content is accessed, and AI builders receive structured, reliable data at scale to build a fairer and healthier ecosystem for content discovery and usage on the Internet.

We’re starting with a private beta. If you want to enroll your website into the AI Index or access the pub/sub web feed as an AI builder, you can sign up today.

Securing the AI Revolution: Introducing Cloudflare MCP Server Portals

Kenny Johnson — Tue, 26 Aug 2025 14:05:00 GMT

Securing the AI Revolution: Introducing Cloudflare MCP Server Portals

Large Language Models (LLMs) are rapidly evolving from impressive information retrieval tools into active, intelligent agents. The key to unlocking this transformation is the Model Context Protocol (MCP), an open-source standard that allows LLMs to securely connect to and interact with any application — from Slack to Canva, to your own internal databases.

This is a massive leap forward. With MCP, an LLM client like Gemini, Claude, or ChatGPT can answer more than just "tell me about Slack." You can ask it: "What were the most critical engineering P0s in Jira from last week, and what is the current sentiment in the #engineering-support Slack channel regarding them? Then propose updates and bug fixes to merge."

This is the power of MCP: turning models into teammates.

But this great power comes with proportional risk. Connecting LLMs to your most critical applications creates a new, complex, and largely unprotected attack surface. Today, we change that. We’re excited to announce Cloudflare MCP Server Portals are now available in Open Beta. MCP Server Portals are a new capability that enable you to centralize, secure, and observe every MCP connection in your organization. This feature is part of Cloudflare One, our secure access service edge (SASE) platform that helps connect and protect your workspace.

What Exactly is the Model Context Protocol?

Think of MCP as a universal translator or a digital switchboard for AI. It’s a standardized set of rules that lets two very different types of software—LLMs and everyday applications—talk to each other effectively. It consists of two primary components:

MCP Clients: These are the LLMs you interact with, like ChatGPT, Claude, or Gemini. The client is the front end to the AI that you use to ask questions and give commands.
MCP Servers: These can be developed for any application you want to connect to your LLM. SaaS providers like Slack or Atlassian may offer MCP servers for their products, or your own developers can also build custom ones for internal tools.

Credit: Architecture Overview - Model Context Protocol

For a useful connection, MCP relies on a few other key concepts:

Resources: A mechanism for the server to give the LLM context. This could be a specific file, a database schema, or a list of users in an application.
Prompts: Standardized questions the server can ask the client to get the information it needs to fulfill a request (e.g., "Which user do you want to search for?").
Tools: These are the actions the client can ask the server to perform, like querying a database, calling an API, or sending a message.

Without MCP, your LLM is isolated. With MCP, it's integrated, capable of interacting with your entire software ecosystem in a structured and predictable way.

The Peril of an Unsecured AI Ecosystem

Think of an LLM as the most brilliant and enthusiastic junior hire you've ever had. They have boundless energy and can produce incredible work, but they lack the years of judgment to know what they shouldn't do. The current, decentralized approach to MCP is like giving that junior hire a master key to every office and server room on their first day.

It's not a matter of if something will go wrong, but when.

This "shadow AI" infrastructure is the modern equivalent of the early Internet, where every server had a public IP address, fully exposed to the world. It’s the Wild West of unmanaged connections, impossible to secure. And the risks go far beyond accidental data deletion. Attackers are actively exploiting the unique vulnerabilities of LLM-driven ecosystems:

Prompt and tool injection: This is more than just telling a model to "ignore previous instructions." Attackers are now hiding malicious commands inside the descriptions of MCP tools themselves. Consider an LLM seeking to use a seemingly harmless "WebSearch" tool. A poisoned description could trick it into also running a query against a financial database and exfiltrating the results.
Supply chain attacks: How can you trust the third-party MCP servers used by your teams? In mid-2025, a critical vulnerability (CVE-2025-6514) was discovered in a popular npm package used for MCP authentication, exposing countless servers. In another incident dubbed "NeighborJack," security researchers found hundreds of MCP servers inadvertently exposed to the public Internet because they were bound to 0.0.0.0 without a firewall, allowing for potential OS command injection and host takeover.
Privilege escalation and the "confused deputy": An attacker doesn't need to break your LLM; they just need to confuse it. In one documented case, an AI agent running with high-level privileges was tricked into executing SQL commands embedded in a support ticket. The agent, acting as a "confused deputy," couldn't distinguish the malicious SQL from the legitimate ticket data and dutifully executed the commands, compromising an entire database.
Data leakage: Without centralized controls, data can bleed between systems in unexpected ways. In June 2025, a popular team collaboration tool’s MCP integration suffered a privacy breach where a bug caused some customer information to become visible in other customers' MCP instances, forcing them to take the integration offline for two weeks.

The Solution: A Single Front Door for Your MCP Servers

You can't protect what you can't see. Cloudflare MCP Server Portals solve this problem by providing a single, centralized gateway for all your MCP servers, somewhat similar to an application launcher for single sign-on. Instead of developers distributing dozens of individual server endpoints, they register their servers with Cloudflare. You provide your users with a single, unified Portal endpoint to configure in their MCP client.

This changes the security posture and user experience immediately. By routing all MCP traffic through Cloudflare, you get:

Centralized policy enforcement: You can integrate MCP Server Portals directly into Cloudflare One. This means you can enforce the same granular access policies for your AI connections that you do for your human users. Require multi-factor authentication, check for device posture, restrict by geography, and ensure only the right users can access specific servers and tools.
Comprehensive visibility and logging: Who is accessing which MCP server and which toolsets are they engaging with? What prompts are being run? What tools are being invoked? Previously, this data was scattered across every individual server. Server Portals aggregate all MCP request logs into a single place, giving you the visibility needed to audit activity and detect anomalies before they become breaches.
A curated AI user experience based on least privilege: Administrators can now review and approve MCP servers before making them available to users through a Portal. When a user authenticates through their Portal, they are only presented with the curated list of servers and tools they are authorized to use, preventing the use of unvetted or malicious third-party servers. This approach adheres to the Zero Trust security best practice of least privilege.
Simplified user configuration: Instead of having to load individual MCP server configurations into a MCP Client, users can load a single URL that pulls down all accessible MCP Servers. This drastically simplifies how many URLs need to be shared out and known by users. As new MCP Servers are added, they become dynamically available through the portal, instead of sharing each new URL on publishing of a server.

When a user connects to their MCP Server Portal, Access prompts them to authenticate with their corporate identity provider. Once authenticated, Cloudflare enforces which MCP Servers the user has access to, regardless of the underlying server’s authorization policies.

For MCP servers with domains hosted on Cloudflare, Access policies can be used to enforce the server’s direct authorization. This is done by creating an OAuth server that is linked to the domain’s existing Access Application. For MCP servers with domains outside Cloudflare and/or hosted by a third party, they require authorization controls outside of Cloudflare Access, this is usually done using OAuth.

The Road Ahead: What's Next for AI Security

MCP Server Portals are a foundational step in our mission to secure the AI revolution. This is just the beginning. In the coming months, we plan to build on this foundation by:

Mechanisms to lock down MCP Servers: Unless an MCP Server author enforces Authorization controls, users can still technically access MCPs outside of a Portal. We will build additional enforcement mechanisms to prevent this.
Integrating with Firewall for AI: Imagine applying the power of our WAF to your MCP traffic, detecting and blocking prompt injection attacks before they ever reach your servers.
Cloudflare hosted MCP Servers: We will make it easy to deploy MCP Servers using Cloudflare’s AI Gateway. This will allow for deeper prompt filtering and controls.
Applying machine learning to detect abuse: We will layer our own machine learning models on top of your MCP logs to automatically identify anomalous behavior, such as unusual data exfiltration patterns or suspicious tool usage.
Enhancing the protocol: We are committed to working with the open-source community to strengthen the MCP standard itself, contributing to a more secure and robust ecosystem for everyone.

This is our commitment: to provide the tools you need to innovate with confidence.

Get Started Today!

Progress doesn't have to come at the expense of security. With MCP Server Portals, you can empower your teams to build the future with AI, safely. This is a critical piece of helping to build a better Internet, and we are excited to see what you will build with it.

MCP Server Portals are now available in Open Beta for all Cloudflare One customers. To get started, navigate to the Access > AI Controls page in the Zero Trust Dashboard. If you don't have an account, you can sign up today and get started with up to 50 free seats or contact our experts to explore larger deployments.

Cloudflare is also starting a user research program focused on AI security. If you are interested in previews of new functionality or want to help shape our roadmap, please express your interest here.

Best Practices for Securing Generative AI with SASE

AJ Gerstenhaber — Tue, 26 Aug 2025 14:00:00 GMT

As Generative AI revolutionizes businesses everywhere, security and IT leaders find themselves in a tough spot. Executives are mandating speedy adoption of Generative AI tools to drive efficiency and stay abreast of competitors. Meanwhile, IT and Security teams must rapidly develop an AI Security Strategy, even before the organization really understands exactly how it plans to adopt and deploy Generative AI.

IT and Security teams are no strangers to “building the airplane while it is in flight”. But this moment comes with new and complex security challenges. There is an explosion in new AI capabilities adopted by employees across all business functions — both sanctioned and unsanctioned. AI Agents are ingesting authentication credentials and autonomously interacting with sensitive corporate resources. Sensitive data is being shared with AI tools, even as security and compliance frameworks struggle to keep up.

While it demands strategic thinking from Security and IT leaders, the problem of governing the use of AI internally is far from insurmountable. SASE (Secure Access Service Edge) is a popular cloud-based network architecture that combines networking and security functions into a single, integrated service that provides employees with secure and efficient access to the Internet and to corporate resources, regardless of their location. The SASE architecture can be effectively extended to meet the risk and security needs of organizations in a world of AI.

Cloudflare’s SASE Platform is uniquely well-positioned to help IT teams govern their AI usage in a secure and responsible way — without extinguishing innovation. What makes Cloudflare different in this space is that we are one of the few SASE vendors that operate not just in cybersecurity, but also in AI infrastructure. This includes: providing AI infrastructure for developers (e.g. Workers AI, AI Gateway, remote MCP servers, Realtime AI Apps) to securing public-facing LLMs (e.g. Firewall for AI or AI Labyrinth), to allowing content creators to charge AI crawlers for access to their content, and the list goes on. Our expertise in this space gives us a unique view into governing AI usage inside an organization. It also gives our customers the opportunity to plug different components of our platform together to build out their AI and AI cybersecurity infrastructure.

This week, we are taking this AI expertise and using it to help ensure you have what you need to implement a successful AI Security Strategy. As part of this, we are announcing several new AI Security Posture Management (AI-SPM) features, including:

shadow AI reporting to gain visibility into employee’s use of AI,
confidence scoring of AI providers to manage risk,
AI prompt protection to defend against malicious inputs and prevent data loss,
out-of-band API CASB integrations with AI providers to detect misconfigurations,
new tools that untangle and secure Model Context Protocol (MCP) deployments in the enterprise.

All of these new AI-SPM features are built directly into Cloudflare’s powerful SASE platform.

And we’re just getting started. In the coming months you can expect to see additional valuable AI-SPM features launch across the Cloudflare platform, as we continue investing in making Cloudflare the best place to protect, connect, and build with AI.

What’s in this AI security guide?

In this guide, we will cover best practices for adopting generative AI in your organization using Cloudflare’s SASE (Secure Access Service Edge) platform. We start by covering how IT and Security leaders can formulate their AI Security Strategy. Then, we show how to implement this strategy using long-standing features of our SASE platform alongside the new AI-SPM features we launched this week.

This guide below is divided into three key pillars for dealing with (human) employee access to AI – Visibility, Risk Management and Data Protection — followed by additional guidelines around deploying agentic AI in the enterprise using MCP. Our objective is to help you align your security strategy with your business goals while driving adoption of AI across all your projects and teams.

And we do this all using our single SASE platform, so you don’t have to deploy and manage a complex hodgepodge of point solutions and security tools. In fact, we provide you with an overview of your AI security posture in a single dashboard, as you can see here:

AI Security Report in Cloudflare’s SASE platform

Develop your AI Security Strategy

The first step to securing AI usage is to establish your organization's level of risk tolerance. This includes pinpointing your biggest security concerns for your users and your data, along with relevant legal and compliance requirements. Relevant issues to consider include:

Do you have specific sensitive data that should not be shared with certain AI tools? (Some examples include personally identifiable information (PII), personal health information (PHI), sensitive financial data, secrets and credentials, source code or other proprietary business information.)
Are there business decisions that your employees should not be making using assistance from AI? (For instance, the EU AI Act AI prohibits the use of AI to evaluate or classify individuals based on their social behavior, personal characteristics, or personality traits.)
Are you subject to compliance frameworks that require you to produce records of the generative AI tools that your employees used, and perhaps even the prompts that your employees input into AI providers? (For example, HIPAA requires organizations to implement audit trails that records who accessed PHI and when, GDPR requires the same for PII, SOC2 requires the same for secrets and credentials.)
Do you have specific data protection requirements that require employees to use the sanctioned, enterprise version of a certain generative AI provider, and avoid certain AI tools or their consumer versions? (Enterprise AI tools often have more favorable terms of service, including shorter data retention periods, more limited data-sharing with third-parties, and/or a promise not to train AI models on user inputs.)
Do you require employees to completely avoid the use of certain AI tools, perhaps because they are unreliable, unreviewed or headquartered in a risky geography?
Are there security protections offered by your organization's sanctioned AI providers and to what extent do you plan to protect against misconfigurations of AI tools that can result in leaks of sensitive data?
What is your policy around the use of autonomous AI agents? What is your strategy for adopting the Model Context Protocol (MCP)? (The Model Context Protocol is a standard way to make information available to large language models (LLMs), similar to the way an application programming interface (API) works. It supports agentic AI that autonomously pursues goals and takes action.)

While almost every organization has relevant compliance requirements that implicate their use of generative AI, there is no “one size fits all” for addressing these issues.

Some organizations have mandates to broadly adopt AI tools of all stripes, while others require employees to interact with sanctioned AI tools only.
Some organizations are rapidly adopting the MCP, while others are not yet ready for agents to autonomously interact with their corporate resources.
Some organizations have robust requirements around data loss prevention (DLP), while others are still early in the process of deploying DLP in their organization.

Even with this diversity of goals and requirements, Cloudflare SASE provides a flexible platform for the implementation of your organization’s AI Security Strategy.

Build a solid foundation for AI Security

To implement your AI Security Strategy, you first need a solid SASE deployment.

SASE provides a unified platform that consolidates security and networking, replacing a fragmented patchwork of point solutions with a single platform that controls application visibility, user authentication, Data Loss Prevention (DLP), and other policies for access to the Internet and access to internal corporate resources. SASE is the essential foundation for an effective AI Security Strategy.

SASE architecture allows you to execute your AI security strategy by discovering and inventorying the AI tools used by your employees. With this visibility, you can proactively manage risk and support compliance requirements by monitoring AI prompts and responses to understand what data is being shared with AI tools. Robust DLP allows you to scan and block sensitive data from being entered into AI tools, preventing data leakage and protecting your organization's most valuable information. Our Secure Web Gateway (SWG) allows you to redirect traffic from unsanctioned AI providers to user education pages or to sanctioned enterprise AI providers. And our new integration of MCP tooling into our SASE platform helps you secure the deployment of agentic AI inside your organization.

If you're just starting your SASE journey, our Secure Internet Traffic Deployment Guide is the best place to begin. For this guide, however, we will skip these introductory details and dive right into using SASE to secure the use of Generative AI.

Gain visibility into your AI landscape

You can't protect what you can't see. The first step is to gain visibility into your AI landscape, which is essential for discovering and inventorying all the AI tools that your employees are using, deploying or experimenting with in your organization.

Discover Shadow AI

Shadow AI refers to the use of AI applications that haven't been officially sanctioned by your IT department. Shadow AI is not an uncommon phenomenon – Salesforce found that over half of the knowledge workers it surveyed admitted to using unsanctioned AI tools at work. Use of unsanctioned AI is not necessarily a sign of malicious intent; employees are often just trying to do their jobs better. As an IT or Security leader, your goal should be to discover Shadow AI and then apply the appropriate AI security policy. There are two powerful ways to do this: inline and out-of-band.

Discover employee usage of AI, inline

The most direct way to get visibility is by using Cloudflare's Secure Web Gateway (SWG).

SWG helps you get a clear picture of both sanctioned and unsanctioned AI and chat applications. By reviewing your detected usage, you'll gain insight into which AI apps are being used in your organization. This knowledge is essential for building policies that support approved tools, and block or control risky ones. This feature requires you to deploy the WARP client in Gateway proxy mode on your end-user devices.

You can review your company’s AI app usage using our new Application Library and Shadow IT dashboards. These tools allow you to:

Review traffic from user devices to understand how many users engage with a specific application over time.
Denote application’s status (e.g., Approved, Unapproved) inside your organization, and use that as input to a variety of SWG policies that control access to applications with that status.
Automate assessment of SaaS and Gen AI applications at scale with our soon-to-be-released Cloudflare Application Confidence Scores.

^{Shadow IT dashboard showing utilization of applications of different status (Approved, Unapproved, In Review, Unreviewed).}

Discover employee usage of AI, out-of-band

Even if your organization doesn't use a device client, you can still get valuable data on Shadow AI usage if you use Cloudflare's integrations for Cloud Access Security Broker (CASB) with services like Google Workspace, Microsoft 365, or GitHub.

Cloudflare CASB provides high-fidelity detail about your SaaS environments, including sensitive data visibility and suspicious user activity. By integrating CASB with your SSO provider, you can see if your users have authenticated to any third-party AI applications, giving you a clear and non-invasive sense of app usage across your organization.

^{An API CASB integration with Google Workspace, showing findings filtered to third party integrations. Findings discover multiple LLM integrations.}

Implement an AI risk management framework

Now that you’ve gained visibility into your AI landscape, the next step is to proactively manage that risk. Cloudflare’s SASE platform allows you to monitor AI prompts and responses, enforce granular security policies, coach users on secure behavior, and prevent misconfigurations in your enterprise AI providers.

Detect and monitor AI prompts and responses

If you have TLS decryption enabled in your SASE platform, you can gain new and powerful insights into how your employees are using AI with our new AI prompt protection feature.

AI Prompt Protection provides you with visibility into the exact prompts and responses from your employees’ interactions with supported AI applications. This allows you to go beyond simply knowing which tools are being used and gives you insight into exactly what kind of information is being shared.

This feature also works with DLP profiles to detect sensitive data in prompts. You can also choose whether to block the action or simply monitor it.

^{Log entry for a prompt detected using AI prompt protection.}

Build granular AI security policies

Once your monitoring tools give you a clear understanding of AI usage, you can begin building security policies to achieve your security goals. Cloudflare's Gateway allows you to create policies based on application categories, application approval status, users, user groups, and device status. For example, you can:

create policies to explicitly allow approved AI applications while blocking unapproved AI applications;
create policies that redirect users from unapproved AI applications to an approved AI application;
limit access to certain applications to specific users or groups that have specific device security posture;
build policies to enable prompt capture (with AI prompt protection) for specific high-risk user groups, such as contractors or new employees, without affecting the rest of the organization; and
put certain applications behind Remote Browser Isolation (RBI), to prevent end users from uploading files or pasting data into the application.

^{Gateway application status policy selector}

All of these policies can be written in Cloudflare Gateway’s unified policy builder, making it easy to deploy your AI Security Strategy across your organization.

Control access to internal LLMs

You can use Cloudflare Access to control your employees’ access to your organization’s internal LLMs, including any proprietary models you train internally and/or models that your organization runs on Cloudflare Worker’s AI.

Cloudflare Access allows you to gate access to these LLMs using fine-grained policies, including ensuring users are granted access based on their identity, user group, device posture, and other contextual signals. For example, you can use Cloudflare Access to write a policy that ensures that only certain data scientists at your organization can access a Workers AI model that is trained on certain types of customer data.

Manage the security posture of third-party AI providers

As you define which AI tools are sanctioned, you can develop functional security controls for consistent usage. Cloudflare newly supports API CASB integrations with popular AI tools like OpenAI (ChatGPT), Anthropic (Claude), and Google Gemini. These "out-of-band" integrations provide immediate visibility into how users are engaging with sanctioned AI tools, allowing you to report on posture management findings include:

Misconfigurations related to sharing settings.
Best practices for API key management.
DLP profile matches in uploaded attachments
Riskier AI features (e.g. autonomous web browsing, code execution) that are toggled on

^{OpenAI API CASB Integration showing riskier features that are toggled on, security posture risks like unused admin credentials, and an uploaded attachment with a DLP profile match.}

Layer on data protection

Robust data protection is the final pillar that protects your employee’s access to AI..

Prevent data loss

Our SASE platform has long supported Data Loss Prevention (DLP) tools that scan and block sensitive data from being entered into AI tools, to prevent data leakage and protect your organization's most valuable information. You can write policies that detect sensitive data while adapting to organization-specific traffic patterns, and use Cloudflare Gateway’s unified policy builder to apply these to your users' interactions with AI tools or other applications. For example, you could write a DLP policy that detects and blocks the upload of a social security number (SSN), phone number or address.

As part of our new AI prompt protection feature, you can now also gain a semantic understanding of your users’ interactions with supported AI providers. Prompts are classified inline into meaningful, high-level topics that include PII, credentials and secrets, source code, financial information, code abuse / malicious code and prompt injection / jailbreak. You can then build inline granular policies based on these high-level topic classifications. For example, you could create a policy that blocks a non-HR employee from submitting a prompt with the intent to receive PII from the response, while allowing the HR team to do so during a compensation planning cycle.

Our new AI prompt protection feature empowers you to apply smart, user-specific DLP rules that empower your teams to get work done, all while strengthening your security posture. To use our most advanced DLP feature, you'll need to enable TLS decryption to inspect traffic.

^{The above policy blocks all ChatGPT prompts that may receive PII back in the response for employees in engineering, marketing, product, and finance user groups.}

Secure MCP — and Agentic AI

MCP (Model Context Protocol) is an emerging AI standard, where MCP servers act as a translation layer for AI agents, allowing them to communicate with public and private APIs, understand datasets, and perform actions. Because these servers are a primary entry point for AI agents to engage with and manipulate your data, they are a new and critical security asset for your security team to manage.

Cloudflare already offers a robust set of developer tools for deploying remote MCP servers—a cloud-based server that acts as a bridge between a user's data and tools and various AI applications. But now our customers are asking for help securing their enterprise MCP deployments.

That is why we’re making MCP security controls a core part of our SASE platform.

Control MCP Authorization

MCP servers typically use OAuth for authorization, where the server inherits the permissions of the authorizing user. While this adheres to least-privilege for the user, it can lead to authorization sprawl — where the agent accumulates an excessive number of permissions over time. This makes the agent a high-value target for attackers.

Cloudflare Access now helps you manage authorization sprawl by applying Zero Trust principles to MCP server access. A Zero Trust model assumes no user, device, or network can be trusted implicitly, so every request is continuously verified. This approach ensures secure authentication and management of these critical assets as your business adopts more agentic workflows.

Centralize management of MCP servers

Cloudflare MCP Server Portal is a new feature in Cloudflare’s SASE platform that centralizes the management, security, and observation of an organization’s MCP servers.

MCP Server Portal allows you to register all your MCP servers with Cloudflare and provide your end users with a single, unified Portal endpoint to configure in their MCP client. This approach simplifies the user experience, because it eliminates the need to configure a one-to-one connection between every MCP client and server. It also means that new MCP servers dynamically become available to users whenever they are added to the Portal.

Beyond these usability enhancements, MCP Server Portal addresses the significant security risks associated with MCP in the enterprise. The current decentralized approach of MCP deployments creates a tangle of unmanaged one-to-one connections that are difficult to secure. The lack of centralized controls creates a variety of risks including prompt injection, tool injection (where malicious code is part of the MCP server itself), supply chain attacks and data leakage.

MCP Server Portals solve this by routing all MCP traffic through Cloudflare, allowing for centralized policy enforcement, comprehensive visibility and logging, and a curated user experience based on the principle of least privilege. Administrators can review and approve MCP servers before making them available, and users are only presented with the servers and tools they are authorized to use, which prevents the use of unvetted or malicious third-party servers.

^{An MCP Server Portal in the Cloudflare Dashboard}

All of these features are only the beginning of our MCP security roadmap, as we continue advancing our support for MCP infrastructure and security controls across the entire Cloudflare platform.

Implement your AI security strategy in a single platform

As organizations rapidly develop and deploy their AI security strategies, Cloudflare’s SASE platform is ideally situated to implement policies that balance productivity with data and security controls.

Our SASE has a full suite of features to protect employee interactions with AI. Some of these features are deeply integrated in our Secure Web Gateway (SWG), including the ability to write fine-grained access policies, gain visibility into Shadow IT and introspect on interactions with AI tools using AI prompt protection. Apart from these inline controls, our CASB provides visibility and control using out-of-band API integrations. Our Cloudflare Access product can apply Zero Trust principles while protecting employee access to corporate LLMs that are hosted on Workers AI or elsewhere. We’re newly integrating controls for securing MCP that can also be used alongside Cloudflare’s Remote MCP Server platform.

And all of these features are integrated directly into Cloudflare’s SASE’s unified dashboard, providing a unified platform for you to implement your AI security strategy. You can even gain a holistic view of all of your AI-SPM controls using our newly-released AI-SPM overview dashboard.

^{AI security report showing utilization of AI applications.}

As one the few SASE vendors that also offer AI infrastructure, Cloudflare’s SASE platform can also be deployed alongside products from our developer and application security platforms to holistically implement your AI security strategy alongside your AI infrastructure strategy (using, for example, Workers AI, AI Gateway, remote MCP servers, Realtime AI Apps, Firewall for AI, AI Labyrinth, or pay per crawl .)

Cloudflare is committed to helping enterprises securely adopt AI

Ensuring AI is scalable, safe, and secure is a natural extension of Cloudflare’s mission, given so much of our success relies on a safe Internet. As AI adoption continues to accelerate, so too does our mission to provide a market-leading set of controls for AI Security Posture Management (AI-SPM). Learn more about how Cloudflare helps secure AI or start exploring our new AI-SPM features in Cloudflare’s SASE dashboard today!

Connect any React application to an MCP server in three lines of code

Dina Kozlov — Wed, 18 Jun 2025 13:00:00 GMT

You can deploy a remote Model Context Protocol (MCP) server on Cloudflare in just one-click. Don’t believe us? Click the button below.

This will get you started with a remote MCP server that supports the latest MCP standards and is the reason why thousands of remote MCP servers have been deployed on Cloudflare, including ones from companies like Atlassian, Linear, PayPal, and more.

But deploying servers is only half of the equation — we also wanted to make it just as easy to build and deploy remote MCP clients that can connect to these servers to enable new AI-powered service integrations. That's why we built use-mcp, a React library for connecting to remote MCP servers, and we're excited to contribute it to the MCP ecosystem to enable more developers to build remote MCP clients.

Today, we're open-sourcing two tools that make it easy to build and deploy MCP clients:

use-mcp — A React library that connects to any remote MCP server in just 3 lines of code, with transport, authentication, and session management automatically handled. We're excited to contribute this library to the MCP ecosystem to enable more developers to build remote MCP clients.
The AI Playground — Cloudflare’s AI chat interface platform that uses a number of LLM models to interact with remote MCP servers, with support for the latest MCP standard, which you can now deploy yourself.

Whether you're building an AI-powered chat bot, a support agent, or an internal company interface, you can leverage these tools to connect your AI agents and applications to external services via MCP.

Ready to get started? Click on the button below to deploy your own instance of Cloudflare’s AI Playground to see it in action.

use-mcp: a React library for building remote MCP clients

use-mcp is a React library that abstracts away all the complexity of building MCP clients. Add the useMCP() hook into any React application to connect to remote MCP servers that users can interact with.

Here’s all the code you need to add to connect to a remote MCP server:

Just specify the URL, and you're instantly connected.

Behind the scenes, use-mcp handles the transport protocols (both Streamable HTTP and Server-Sent Events), authentication flows, and session management. It also includes a number of features to help you build reliable, scalable, and production-ready MCP clients.

Connection management

Network reliability shouldn’t impact user experience. use-mcp manages connection retries and reconnections with a backoff schedule to ensure your client can recover the connection during a network issue and continue where it left off. The hook exposes real-time connection states ("connecting", "ready", "failed"), allowing you to build responsive UIs that keep users informed without requiring you to write any custom connection handling logic.

Authentication & authorization

Many MCP servers require some form of authentication in order to make tool calls. use-mcp supports OAuth 2.1 and handles the entire OAuth flow. It redirects users to the login page, allows them to grant access, securely stores the access token returned by the OAuth provider, and uses it for all subsequent requests to the server. The library also provides methods for users to revoke access and clear stored credentials. This gives you a complete authentication system that allows you to securely connect to remote MCP servers, without writing any of the logic.

Dynamic tool discovery

When you connect to an MCP server, use-mcp fetches the tools it exposes. If the server adds new capabilities, your app will see them without any code changes. Each tool provides type-safe metadata about its required inputs and functionality, so your client can automatically validate user input and make the right tool calls.

Debugging & monitoring capabilities

To help you troubleshoot MCP integrations, use-mcp exposes a log array containing structured messages at debug, info, warn, and error levels, with timestamps for each one. You can enable detailed logging with the debug option to track tool calls, authentication flows, connection state changes, and errors. This real-time visibility makes it easier to diagnose issues during development and production.

Future-proofed & backwards compatible

MCP is evolving rapidly, with recent updates to transport mechanisms and upcoming changes to authorization. use-mcp supports both Server-Sent Events (SSE) and the newer Streamable HTTP transport, automatically detecting and upgrading to newer protocols, when supported by the MCP server.

As the MCP specification continues to evolve, we'll keep the library updated with the latest standards, while maintaining backwards compatibility. We are also excited to contribute use-mcp to the MCP project, so it can grow with help from the wider community.

MCP Inspector, built with use-mcp

In use-mcp’s examples directory, you’ll see a minimal MCP Inspector that was built with the use-mcp hook. . Enter any MCP server URL to test connections, see available tools, and monitor interactions through the debug logs. It's a great starting point for building your own MCP clients or something you can use to debug connections to your MCP server.

Open-sourcing the AI Playground

We initially built the AI Playground to give users a chat interface for testing different AI models supported by Workers AI. We then added MCP support, so it could be used as a remote MCP client to connect to and test MCP servers. Today, we're open-sourcing the playground, giving you the complete chat interface with the MCP client built in, so you can deploy it yourself and customize it to fit your needs.

The playground comes with built-in support for the latest MCP standards, including both Streamable HTTP and Server-Sent Events transport methods, OAuth authentication flows that allow users to sign-in and grant permissions, as well as support for bearer token authentication for direct MCP server connections.

How the AI Playground works

The AI Playground is built on Workers AI, giving you access to a full catalog of large language models (LLMs) running on Cloudflare's network, combined with the Agents SDK and use-mcp library for MCP server connections.

The AI Playground uses the use-mcp library to manage connections to remote MCP servers. When the playground starts up, it initializes the MCP connection system with const{tools: mcpTools} = useMcp(), which provides access to all tools from connected servers. At first, this list is empty because it’s not connected to any MCP servers, but once a connection to a remote MCP server is established, the tools are automatically discovered and populated into the list.

Once connected, the playground immediately has access to any tools that the MCP server exposes. The use-mcp library handles all the protocol communication and tool discovery, and maintains the connection state. If the MCP server requires authentication, the playground handles OAuth flows through a dedicated callback page that uses onMcpAuthorization from use-mcp to complete the authentication process.

When a user sends a chat message, the playground takes the mcpTools from the use-mcp hook and passes them directly to Workers AI, enabling the model to understand what capabilities are available and invoke them as needed.

Debugging and monitoring

To monitor and debug connections to MCP servers, we’ve added a Debug Log interface to the playground. This displays real-time information about the MCP server connections, including connection status, authentication state, and any connection errors.

During the chat interactions, the debug interface will show the raw message exchanged between the playground and the MCP server, including the tool invocation and its result. This allows you to monitor the JSON payload being sent to the MCP server, the raw response returned, and track whether the tool call succeeded or failed. This is especially helpful for anyone building remote MCP servers, as it allows you to see how your tools are behaving when integrated with different language models.

Contributing to the MCP ecosystem

One of the reasons why MCP has evolved so quickly is that it's an open source project, powered by the community. We're excited to contribute the use-mcp library to the MCP ecosystem to enable more developers to build remote MCP clients.

If you're looking for examples of MCP clients or MCP servers to get started with, check out the Cloudflare AI GitHub repository for working examples you can deploy and modify. This includes the complete AI Playground source code, a number of remote MCP servers that use different authentication & authorization providers, and the MCP Inspector.

We’re also building the Cloudflare MCP servers in public and welcome contributions to help make them better.

Whether you're building your first MCP server, integrating MCP into an existing application, or contributing to the broader ecosystem, we'd love to hear from you. If you have any questions, feedback, or ideas for collaboration, you can reach us via email at 1800-mcp@cloudflare.com.

Thirteen new MCP servers from Cloudflare you can use today

Nevi Shah — Thu, 01 May 2025 13:01:19 GMT

You can now connect to Cloudflare's first publicly available remote Model Context Protocol (MCP) servers from Claude.ai (now supporting remote MCP connections!) and other MCP clients like Cursor, Windsurf, or our own AI Playground. Unlock Cloudflare tools, resources, and real time information through our new suite of MCP servers including:

… all through a natural language interface!

Today, we also announced our collaboration with Anthropic to bring remote MCP to Claude users, and showcased how other leading companies such as Atlassian, PayPal, Sentry, and Webflow have built remote MCP servers on Cloudflare to extend their service to their users. We’ve also been using the same infrastructure and tooling to build out our own suite of remote servers, and today we’re excited to show customers what’s ready for use and share what we’ve learned along the way.

Cloudflare’s MCP servers available today:

These MCP servers allow your MCP Client to read configurations from your account, process information, make suggestions based on data, and even make those suggested changes for you. All of these actions can happen across Cloudflare's many services including application development, security, and performance.

Cloudflare Documentation Server: Get up-to-date reference information on Cloudflare

Our Cloudflare Documentation server enables any MCP Client to access up-to-date documentation in real-time, rather than relying on potentially outdated information from the model's training data. If you’re new to building with Cloudflare, this server synthesizes information right from our documentation and exposes it to your MCP Client, so you can get reliable, up-to-date responses to any complex question like “Search Cloudflare for the best way to build an AI Agent”.

Workers Bindings server: Build with developer resources

Connecting to the Bindings MCP server lets you leverage application development primitives like D1 databases, R2 object storage and Key Value stores on the fly as you build out a Workers application. If you're leveraging your MCP Client to generate code, the bindings server provides access to read existing resources from your account or create fresh resources to implement in your application. In combination with our base prompt designed to help you build robust Workers applications, you can add the Bindings MCP server to give your client all it needs to start generating full stack applications from natural language.

Full example output using the Workers Bindings MCP server can be found here.

Workers Observability server: Debug your application

The Workers Observability MCP server integrates with Workers Logs to browse invocation logs and errors, compute statistics across invocations, and find specific invocations matching specific criteria. By querying logs across all of your Workers, this MCP server can help isolate errors and trends quickly. The telemetry data that the MCP server returns can also be used to create new visualizations and improve observability.

Container server: Spin up a development environment

The Container MCP server provides any MCP client with access to a secure, isolated execution environment running on Cloudflare’s network where it can run and test code if your MCP client does not have a built in development environment (e.g. claude.ai). When building and generating application code, this lets the AI run its own commands and validate its assumptions in real time.

Browser Rendering server: Fetch and convert web pages, take screenshots

The Browser Rendering MCP server provides AI friendly tools from our RESTful interface for common browser actions such as capturing screenshots, extracting HTML content, and converting pages to Markdown. These are particularly useful when building agents that require interacting with a web browser.

Radar server: Ask questions about how we see the Internet and Scan URLs

The Cloudflare Radar MCP server exposes tools that allow any MCP client to explore our aggregated HTTP traffic data, get information on Autonomous Systems (AS) and IP addresses, list traffic anomalies from our Outage Center, get trending domains, and domain rank information. It can even create charts. Here’s a chat where we ask "show me the HTTP traffic from Portugal for the last week":

Logpush server: Get quick summaries for Logpush job health

Logpush jobs deliver comprehensive logs to your destination of choice, allowing near real-time information processing. The Logpush MCP server can help you analyze your Logpush job results and understand your job health at a high level, allowing you to filter and narrow down for jobs or scenarios you care about. For example, you can ask “provide me with a list of recently failed jobs.” Now, you can quickly find out which jobs are failing with which error message and when, summarized in a human-readable format.

AI Gateway server: Check out your AI Gateway logs

Use this MCP server to inspect your AI Gateway logs and get details about the data from your prompts and the AI models responses. In this example we ask our agent “What is my average latency for my AI Gateway logs in the Cloudflare Radar account?”

AutoRAG server: List and search documents on your AutoRAGs

Having AutoRAG RAGs available to query as MCP tools greatly expands the typical static one-shot retrieval and opens doors to use cases where the agent can dynamically decide if and when to retrieve information from one or more RAGs, combine them with other tools and APIs, cross-check information and generate a much more rich and complete final answer.

Here we have a RAG that has a few blog posts that talk about retrocomputers. If we ask "tell me about restoring an amiga 1000 using the blog-celso autorag" the agent will go into a sequence of reasoning steps:

“Now that I have some information about Amiga 1000 restoration from blog-celso, let me search for more specific details.”
“Let me get more specific information about hardware upgrades and fixes for the Amiga 1000.”
“Let me get more information about the DiagROM and other tools used in the restoration.”
“Let me search for information about GBA1000 and other expansions mentioned in the blog.”
And finally, “Based on the comprehensive information I've gathered from the blog-celso AutoRAG, I can now provide you with a detailed guide on restoring an Amiga 1000.”

And at the end, it generates a very detailed answer based on all the data from all the queries:

Audit Logs server: Query audit logs and generate reports for review

Audit Logs record detailed information about actions and events within a system, providing a transparent history of all activity. However, because these logs can be large and complex, it may take effort to query and reconstruct a clear sequence of events. The Audit Logs MCP server helps by allowing you to query audit logs and generate reports. Common queries include if anything notable happened in a Cloudflare account under a user around a particular time of the day, or identifying whether any users used API keys to perform actions on the account. For example, you can ask “Were there any suspicious changes made to my Cloudflare account yesterday around lunchtime?” and obtain the following response:

DNS Analytics server: Optimize DNS performance and debug issues based on current set up

Cloudflare’s DNS Analytics provides detailed insights into DNS traffic, which helps you monitor, analyze, and troubleshoot DNS performance and security across your domains. With Cloudflare’s DNS Analytics MCP server, you can review DNS configurations across all domains in your account, access comprehensive DNS performance reports, and receive recommendations for performance improvements. By leveraging documentation, the MCP server can help identify opportunities for improving performance.

Digital Experience Monitoring server: Get quick insight on critical applications for your organization

Cloudflare Digital Experience Monitoring (DEM) was built to help network professionals understand the performance and availability of their critical applications from self-hosted applications like Jira and Bitbucket to SaaS applications like Figma or Salesforce. The Digital Experience Monitoring MCP server fetches DEM test results to surface performance and availability trends within your Cloudflare One deployment, providing quick insights on users, applications, and the networks they are connected to. You can ask questions like: Which users had the worst experience? What times of the day were applications most and least performant? When do I see the most HTTP status errors? When do I see the shortest, longest, or most instability in the network path?

CASB server: Insights from SaaS Integrations

Cloudflare CASB provides the ability to integrate with your organization’s SaaS and cloud applications to discover assets and surface any security misconfigurations that may be present. A core task is helping security teams understand information about users, files, and other assets they care about that transcends any one SaaS application. The CASB MCP server can explore across users, files, and the many other asset categories to help understand relationships from data that can exist across many different integrations. A common query may include “Tell me about “Frank Meszaros” and what SaaS tools they appear to have accessed”.

Get started with our MCP servers

You can start using our Cloudflare MCP servers today! If you’d like to read more about specific tools available in each server, you can find them in our public GitHub repository. Each server is deployed to a server URL, such as

https://observability.mcp.cloudflare.com/sse.

If your MCP client has first class support for remote MCP servers, the client will provide a way to accept the server URL directly within its interface. For example, if you are using claude.ai, you can:

Navigate to your settings and add a new “Integration” by entering the URL of your MCP server
Authenticate with Cloudflare
Select the tools you’d like claude.ai to be able to call

If your client does not yet support remote MCP servers, you will need to set up its respective configuration file (mcp_config.json) using mcp-remote to specify which servers your client can access.

Have feedback on our servers?

While we're launching with these initial 13 MCP servers, we are just getting started! We want to hear your feedback as we shape existing and build out more Cloudflare MCP servers that unlock the most value for your teams leveraging AI in their daily workflows. If you’d like to provide feedback, request a new MCP server, or report bugs, please raise an issue on our GitHub repository.

Building your own MCP server?

If you’re interested in building your own servers, we've discovered valuable best practices that we're excited to share with you as we’ve been building ours. While MCP is really starting to gain momentum and many organizations are just beginning to build their own servers, these principals should help guide you as you start building out MCP servers for your customers.

An MCP server is not our entire API schema: Our goal isn't to build a large wrapper around all of Cloudflare’s API schema, but instead focus on optimizing for specific jobs to be done and reliability of the outcome. This means while one tool from our MCP server may map to one API, another tool may map to many. We’ve found that fewer but more powerful tools may be better for the agent with smaller context windows, less costs, a faster output, and likely more valid answers from LLMs. Our MCP servers were created directly by the product teams who are responsible for each of these areas of Cloudflare – application development, security and performance – and are designed with user stories in mind. This is a pattern you will continue to see us use as we build out more Cloudflare servers.
Specialize permissions with multiple servers: We built out several specialized servers rather than one for a critical reason: security through precise permission scoping. Each MCP server operates with exactly the permissions needed for its specific task – nothing more. By separating capabilities across multiple servers, each with its own authentication scope, we prevent the common security pitfall of over-privileged access.
Add robust server descriptions within parameters: Tool descriptions were core to providing helpful context to the agent. We’ve found that more detailed descriptions help the agent understand not just the expected data type, but also the parameter's purpose, acceptable value ranges, and impact on server behavior. This context allows agents to make intelligent decisions about parameter values rather than providing arbitrary and potentially problematic inputs, allowing your natural language to go further with the agent.
Using evals at each iteration: For each server, we implemented evaluation tests or “evals” to assess the model's ability to follow instructions, select appropriate tools, and provide correct arguments to those tools. This gave us a programmatic way to understand if any regressions occurred through each iteration, especially when tweaking tool descriptions.

Ready to start building? Click the button below to deploy your first remote MCP server to production:

Or check out our documentation to learn more! If you have any questions or feedback for us, you can reach us via email at 1800-mcp@cloudflare.com or join the chatter in the Cloudflare Developers Discord.

MCP Demo Day: How 10 leading AI companies built MCP servers on Cloudflare

Dina Kozlov — Thu, 01 May 2025 13:00:21 GMT

Today, we're excited to collaborate with Anthropic, Asana, Atlassian, Block, Intercom, Linear, PayPal, Sentry, Stripe, and Webflow to bring a whole new set of remote MCP servers, all built on Cloudflare, to enable Claude users to manage projects, generate invoices, query databases, and even deploy full stack applications — without ever leaving the chat interface.

Since Anthropic’s introduction of the Model Context Protocol (MCP) in November, there’s been more and more excitement about it, and it seems like a new MCP server is being released nearly every day. And for good reason! MCP has been the missing piece to make AI agents a reality, and helped define how AI agents interact with tools to take actions and get additional context.

But to date, end-users have had to install MCP servers on their local machine to use them. Today, with Anthropic’s announcement of Integrations, you can access an MCP server the same way you would a website: type a URL and go.

At Cloudflare, we’ve been focused on building out the tooling that simplifies the development of remote MCP servers, so that our customers’ engineering teams can focus their time on building out the MCP tools for their application, rather than managing the complexities of the protocol. And if you’re wondering just how easy is it to deploy a remote MCP server on Cloudflare, we’re happy to tell you that it only takes one click to get an MCP server — pre-built with support for the latest MCP standards — deployed.

But you don’t have to take our word for it, see it for yourself! Industry leaders are taking advantage of the ease of use to deliver new AI-powered experiences to their users by building their MCP servers on Cloudflare and now, you can do the same — just click “Deploy to Cloudflare” to get started.

Keep reading to learn more about the new capabilities that these companies are unlocking for their users and how they were able to deliver them. Or, see it in action by joining us for Demo Day on May 1 (today) at 10:00 AM PST.

We’re also making Cloudflare's remote MCP servers available to customers today and sharing what we learned from building them out.

MCP: Powering the next generation of applications

It wasn’t always the expectation that every service, whether a store, real estate agent, or service would have a website. But as more people gained access to an Internet connection, that quickly became the case.

We’re in the midst of a similar transition now to every web user having access to AI tools, turning to them for many tasks. If you’re a developer, it’s likely the case that the first place you turn when you go to write code now is to a tool like Claude. It seems reasonable then, that if Claude helped you write the code, it would also help you deploy it.

Or if you’re not a developer, if Claude helped you come up with a recipe, that it would also help you order the required groceries.

With remote MCP, all of these scenarios are now possible. And just like the first businesses built on the web had a first mover advantage, the first businesses to be built in an MCP-forward way are likely to reap the benefits.

The faster a user can experience the value of your product, the more likely they will be to succeed, upgrade, and continue to use your product. By connecting services to agents through MCP, users can simply ask for what they need and the agent will handle the rest, getting them to that “aha” moment faster.

Making your service AI-first with MCP

Businesses that adopt MCP will quickly see the impact:

Lower the barrier to entry

Not every user has the time to learn your dashboard or read through documentation to understand your product. With MCP, they don’t need to. They just describe what they want, and the agent figures out the rest.

Create personalized experiences

MCP can keep track of a user’s requests and interactions, so future tool calls can be adapted to their usage patterns and preferences. This makes it easy to deliver more personalized, relevant experiences based on how each person actually uses your product.

Drive upgrades naturally

Rather than relying on feature comparison tables, the AI agent can explain how a higher-tier plan helps a specific user accomplish their goals.

Ship new features and integrations

You don’t need to build every integration or experience in-house. By exposing your tools via MCP, you let users connect your product to the rest of their stack. Agents can combine tools across providers, enabling integrations without requiring you to support every third-party service directly.

Why build MCP on Cloudflare?

Since the launch of MCP, we’ve shared how we’re making it easy for developers to build and deploy remote MCP servers — from abstracting away protocol complexity to handling auth and transport, to supporting fully stateful MCP servers (by default) that “sleep” when they’re not used to minimize idle costs.

We’re continuing to ship updates regularly — adding support for the latest changes to the MCP standard and making it even easier to get your server to production:

Supporting the new Streamable HTTP transport: Your MCP servers can now use the latest Streamable HTTP transport alongside SSE, ensuring compatibility with the latest standards. We’ve updated the AI Playground and mcp-remote proxy as well, giving you a remote MCP client to easily test the new transport.
Python support for MCP servers: You can now build MCP servers on Cloudflare using Python, not just JavaScript/TypeScript.
Improved docs, starter templates, and deploy flows: We’ve added new quickstart templates, expanded our documentation to cover the new transport method, and shared best practices for building MCP servers based on our learnings.

But rather than telling you, we thought it would be better to show you what our customers have built and why they chose Cloudflare for their MCP deployment.

Remote MCP servers you can connect to today

Asana

Today, work lives across many different apps and services. By investing in MCP, Asana can meet users where they are, enabling agent-driven interoperability across tools and workflows. Users can interact with the Asana Work Graph using natural language to get project updates, search for tasks, manage projects, send comments, and update deadlines from an MCP client.

Users will be able to orchestrate and integrate different pieces of work seamlessly — for example, turning a project plan or meeting notes from another app into a fully assigned set of tasks in Asana, or pulling Asana tasks directly into an MCP-enabled IDE for implementation. This flexibility makes managing work across systems easier, faster, and more natural than ever before.

To accelerate the launch of our first-party MCP server, Asana built on Cloudflare's tooling, taking advantage of the foundational infrastructure to move quickly and reliably in this fast-evolving space.

"At Asana, we've always focused on helping teams coordinate work effortlessly. MCP connects our Work Graph directly to AI tools like Claude.ai, enabling AI to become a true teammate in work management. Our integration transforms natural language into structured work – creating projects from meeting notes or pulling updates into AI. Building on Cloudflare's infrastructure allowed us to deploy quickly, handling authentication and scaling while we focused on creating the best experience for our users." – Prashant Pandey, Chief Technology Officer, Asana

Learn more about Asana’s MCP server here.

Atlassian

Jira and Confluence Cloud customers can now securely interact with their data directly from Anthropic’s Claude app via the Atlassian Remote MCP Server in beta. Hosted on Cloudflare infrastructure, users can summarize work, create issues or pages, and perform multi-step actions, all while keeping data secure and within permissioned boundaries.

Users can access information from Jira and Confluence wherever they use Claude to:

Summarize Jira work items or Confluence pages
Create Jira work items or Confluence pages directly from Claude
Get the model to take multiple actions in one go, like creating issues or pages in bulk
Enrich Jira work items with context from many different sources to which Claude has access
And so much more!

“AI is not one-size-fits-all and we believe that it needs to be embedded within a team’s jobs to be done. That’s why we’re so excited to invest in MCP and meet teams in more places where they already work. Hosting on Cloudflare infrastructure means we can bring this powerful integration to our customers faster and empower them to do more than ever with Jira and Confluence, all while keeping their enterprise data secure. Cloudflare provided everything from OAuth to out-of-the-box remote MCP support so we could quickly build, secure, and scale a fully operational setup.” — Taroon Mandhana, Head of Product Engineering, Atlassian

Learn more about Atlassian’s MCP server here.

Intercom

At Intercom, the transformative power of AI is becoming increasingly clear. Fin, Intercom’s AI Agent, is now autonomously resolving over 50% of customer support conversations for leading companies such as Anthropic. With MCP, connecting AI to internal tools and systems is easier than ever, enabling greater business value.

Customer conversations, for instance, offer valuable insights into how products are being used and the experiences customers are having. However, this data is often locked within the support platform. The Intercom MCP server unlocks this rich source of customer data, making it accessible to anyone in the organization using AI tools. Engineers, for example, can leverage conversation history and user data from Intercom in tools like Cursor or Claude Code to diagnose and resolve issues more efficiently.

“The momentum behind MCP is exciting. It’s making it easier and easier to connect assistants like Claude.ai and agents like Fin to your systems and get real work done. Cloudflare's toolkit is accelerating that movement even faster. Their clear documentation, purpose-built tools, and developer-first platform helped Intercom go from concept to production in under a day, making the Intercom MCP server launch effortless. We’ll be encouraging our customers to leverage Cloudflare to build and deploy their own MCP servers to securely and reliably connect their internal systems to Fin and other clients.” — Jordan Neill, SVP Engineering, Intercom

Linear

The Linear MCP server allows users to bring the context of their issues and product development process directly into AI assistants when it's needed. Whether that is refining a product spec in Claude, collaborating on fixing a bug with Cursor, or creating issues on the fly from an email.

“We're building on Cloudflare to take advantage of their frameworks in this fast-moving space and flexible, fast, compute at the edge. With MCP, we're bringing Linear's issue tracking and product development workflows directly into their AI tools of choice, eliminating context switching for teams. Our goal is simple: let developers and product teams access their work where they already are—whether refining specs in Claude, debugging in Cursor, or creating issues from conversations. This seamless integration helps our customers stay in flow and focused on building great products” — Tom Moor, Head of US Engineering, Linear

Learn more about Linear’s MCP server here.

PayPal

"MCPs represent a new paradigm for software development. With PayPal's remote MCP server on Cloudflare, now developers can delegate to an agent with natural language to seamlessly integrate with PayPal's portfolio of commerce capabilities. Whether it's managing inventory, processing payments, tracking shipping, handling refunds, AI agents via MCP can tap into these capabilities to autonomously execute and optimize commerce workflows. This is a revolutionary development for commerce, and the best part is, developers can begin integrating with our MCP server on Cloudflare today." - Prakhar Mehrotra, SVP of Artificial Intelligence, PayPal

Learn more about PayPal’s MCP server here.

Sentry

With the Sentry MCP server, developers are able to query Sentry’s context right from their IDE, or an assistant like Claude, to get errors and issue information across projects or even for individual files.

Developers can also use the MCP to create projects, capture setup information, and query project and organization information - and use the information to set up their applications for Sentry. As we continue to build out the MCP further, we'll allow teams to bring in root cause analysis and solution information from Seer, our Agent, and also look at simplifying instrumentation for sentry functionality like traces, and exception handling.

Hosting this on Cloudflare and using Remote MCP, we were able to sidestep a number of the complications of trying to run locally, like scaling or authentication. Remote MCP lets us leverage Sentry’s own OAuth configuration directly. Durable Object support also lets us maintain state within the MCP, which is important when you’re not running locally.

“Sentry’s commitment has always been to the developer, and making it easier to keep production software running stable, and that’s going to be even more true in the AI era. Developers are utilizing tools like MCP to integrate their stack with AI models and data sources. We chose to build our MCP on Cloudflare because we share a vision of making it easier for developers to ship software, and are both invested in ensuring teams can build and safely run the next generation of AI agents. Debugging the complex interactions arising from these integrations is increasingly vital, and Sentry provides the essential visibility needed to rapidly diagnose and resolve issues. MCP integrates this crucial Sentry context directly into the developer workflow, empowering teams to consistently build and deploy reliable applications.” — David Cramer, CPO and Co-Founder, Sentry

Learn more about Sentry’s MCP server here.

Block

Square's APIs are a comprehensive set of tools that help sellers take payments, create and track orders, manage inventory, organize customers, and more. Now, with a dedicated Square MCP server, sellers can enlist the help of an AI agent to build their business on Square’s entire suite of API resources and endpoints. By integrating with AI agents like Claude and codename goose, sellers can craft sophisticated, customized use cases that fully utilize Square’s capabilities, at a lower technical barrier.

Learn more about Block’s MCP server here.

Stripe

“MCP is emerging as a new AI interface. In the near-future, MCP may become the default way, or in some cases the only way, people, businesses, and code discover and interact with services. With Stripe's agent toolkit and Cloudflare’s Agent SDK, developers can now monetize their MCPs with just a few lines of code.” — Jeff Weinstein, Product Lead at Stripe

Webflow

The Webflow MCP server supports CMS management, auditing and improving SEO, content localization, site publishing, and more. This enables users to manage and improve their site directly from their AI agent.

"We see MCP as a new way to interact with Webflow that aligns well with our mission of bringing development superpowers to everyone. MCP is not just a different surface over our APIs, instead we’re thinking of it in terms of the actions it supports: publish a website, create a CMS collection, update content, and more. MCP lets us expose those actions in a way that’s discoverable, secure, and deeply contextual. It opens the door to new workflows where AI and humans can work side-by-side, without needing to cobble together solutions or handle low-level API details. Cloudflare supports this aim by offering the reliability, performance, and developer tooling we need to build modern web infrastructure. Their support for remote MCP servers is mature and well-integrated, and their approach to authentication and durability aligns with how we think about the scale and security of these offerings.” — Utkarsh Sengar, VP of Engineering, Webflow

Learn more about Webflow’s MCP server here.

Start building today

If you’re looking to build a remote MCP server for your service, get started with our documentation, watch the tutorial below, or use the button below to get a starter remote MCP server deployed to production. Once the remote MCP server is deployed, it can be used from Claude, Cloudflare’s AI playground, or any remote MCP client.

In addition, we launched a new YouTube video walking you through building MCP servers, using two of our MCP templates.

If you have any questions or feedback for us, you can reach us via email at 1800-mcp@cloudflare.com.

Bringing streamable HTTP transport and Python language support to MCP servers

Jeremy Morrell — Wed, 30 Apr 2025 14:00:00 GMT

We’re continuing to make it easier for developers to bring their services into the AI ecosystem with the Model Context Protocol (MCP). Today, we’re announcing two new capabilities:

Streamable HTTP Transport: The Agents SDK now supports the new Streamable HTTP transport, allowing you to future-proof your MCP server. Our implementation allows your MCP server to simultaneously handle both the new Streamable HTTP transport and the existing SSE transport, maintaining backward compatibility with all remote MCP clients.
Deploy MCP servers written in Python: In 2024, we introduced first-class Python language support in Cloudflare Workers, and now you can build MCP servers on Cloudflare that are entirely written in Python.

Click “Deploy to Cloudflare” to get started with a remote MCP server that supports the new Streamable HTTP transport method, with backwards compatibility with the SSE transport.

Streamable HTTP: A simpler way for AI agents to communicate with services via MCP

The MCP spec was updated on March 26 to introduce a new transport mechanism for remote MCP, called Streamable HTTP. The new transport simplifies how AI agents can interact with services by using a single HTTP endpoint for sending and receiving responses between the client and the server, replacing the need to implement separate endpoints for initializing the connection and for sending messages.

Upgrading your MCP server to use the new transport method

If you've already built a remote MCP server on Cloudflare using the Cloudflare Agents SDK, then adding support for Streamable HTTP is straightforward. The SDK has been updated to support both the existing Server-Sent Events (SSE) transport and the new Streamable HTTP transport concurrently.

Here's how you can configure your server to handle both transports:

Or, if you’re using Hono:

Or if your MCP server implements authentication & authorization using the Workers OAuth Provider Library:

The key changes are:

Use MyMcpAgent.serveSSE('/sse') for the existing SSE transport. Previously, this would have been MyMcpAgent.mount('/sse'), which has been kept as an alias.
Add a new path with MyMcpAgent.serve('/mcp') to support the new Streamable HTTP transport

That's it! With these few lines of code, your MCP server will support both transport methods, making it compatible with both existing and new clients.

Using Streamable HTTP from an MCP client

While most MCP clients haven’t yet adopted the new Streamable HTTP transport, you can start testing it today using mcp-remote, an adapter that lets MCP clients like Claude Desktop that otherwise only support local connections work with remote MCP servers. This tool allows any MCP client to connect to remote MCP servers via either SSE or Streamable HTTP, even if the client doesn't natively support remote connections or the new transport method.

So, what’s new with Streamable HTTP?

Initially, remote MCP communication between AI agents and services used a single connection but required interactions with two different endpoints: one endpoint (/sse) to establish a persistent Server-Sent Events (SSE) connection that the client keeps open for receiving responses and updates from the server, and another endpoint (/sse/messages) where the client sends requests for tool calls.

While this works, it's like having a conversation with two phones, one for listening and one for speaking. This adds complexity to the setup, makes it harder to scale, and requires connections to be kept open for long periods of time. This is because SSE operates as a persistent one-way channel where servers push updates to clients. If this connection closes prematurely, clients will miss responses or updates sent from the MCP server during long-running operations.

The new Streamable HTTP transport addresses these challenges by enabling:

Communication through a single endpoint: All MCP interactions now flow through one endpoint, eliminating the need to manage separate endpoints for requests and responses, reducing complexity.
Bi-directional communication: Servers can send notifications and requests back to clients on the same connection, enabling the server to prompt for additional information or provide real-time updates.
Automatic connection upgrades: Connections start as standard HTTP requests, but can dynamically upgrade to SSE (Server-Sent Events) to stream responses during long-running tasks.

Now, when an AI agent wants to call a tool on a remote MCP server, it can do so with a single POST request to one endpoint (/mcp). Depending on the tool call, the server will either respond immediately or decide to upgrade the connection to use SSE to stream responses or notifications as they become available — all over the same request.

Our current implementation of Streamable HTTP provides feature parity with the previous SSE transport. We're actively working to implement the full capabilities defined in the specification, including resumability, cancellability, and session management to enable more complex, reliable, and scalable agent-to-agent interactions.

What’s coming next?

The MCP specification is rapidly evolving, and we're committed to bringing these changes to the Agents SDK to keep your MCP server compatible with all clients. We're actively tracking developments across both transport and authorization, adding support as they land, and maintaining backward compatibility to prevent breaking changes as adoption grows. Our goal is to handle the complexity behind the scenes, so you can stay focused on building great agent experiences.

On the transport side, here are some of the improvements coming soon to the Agents SDK:

Resumability: If a connection drops during a long-running operation, clients will be able to resume exactly where they left off without missing any responses. This eliminates the need to keep connections open continuously, making it ideal for AI agents that run for hours.
Cancellability: Clients will have explicit mechanisms to cancel operations, enabling cleaner termination of long-running processes.
Session management: We're implementing secure session handling with unique session IDs that maintain state across multiple connections, helping build more sophisticated agent-to-agent communication patterns.

Deploying Python MCP Servers on Cloudflare

In 2024, we introduced Python Workers, which lets you write Cloudflare Workers entirely in Python. Now, you can use them to build and deploy remote MCP servers powered by the Python MCP SDK — a library for defining tools and resources using regular Python functions.

You can deploy a Python MCP server to your Cloudflare account with the button below, or read the code here.

Here’s how you can define tools and resources in the MCP server:

If you're already building APIs with FastAPI, a popular Python package for quickly building high performance API servers, you can use FastAPI-MCP to expose your existing endpoints as MCP tools. It handles the protocol boilerplate for you, making it easy to bring FastAPI-based services into the agent ecosystem.

With recent updates like support for Durable Objects and Cron Triggers in Python Workers, it’s now easier to run stateful logic and scheduled tasks directly in your MCP server.

Start building a remote MCP server today!

On Cloudflare, you can start building today. We’re ready for you, and ready to help build with you. Email us at 1800-mcp@cloudflare.com, and we’ll help get you going. There’s lots more to come with MCP, and we’re excited to see what you build.

Piecing together the Agent puzzle: MCP, authentication & authorization, and Durable Objects free tier

Rita Kozlov — Mon, 07 Apr 2025 13:10:00 GMT

It’s not a secret that at Cloudflare we are bullish on the future of agents. We’re excited about a future where AI can not only co-pilot alongside us, but where we can actually start to delegate entire tasks to AI.

While it hasn’t been too long since we first announced our Agents SDK to make it easier for developers to build agents, building towards an agentic future requires continuous delivery towards this goal. Today, we’re making several announcements to help accelerate agentic development, including:

New Agents SDK capabilities: Build remote MCP clients, with transport and authentication built-in, to allow AI agents to connect to external services.
BYO Auth provider for MCP: Integrations with Stytch, Auth0, and WorkOS to add authentication and authorization to your remote MCP server.
Hibernation for McpAgent: Automatically sleep stateful, remote MCP servers when inactive and wake them when needed. This allows you to maintain connections for long-running sessions while ensuring you’re not paying for idle time.
Durable Objects free tier: We view Durable Objects as a key component for building agents, and if you’re using our Agents SDK, you need access to it. Until today, Durable Objects was only accessible as part of our paid plans, and today we’re excited to include it in our free tier.
Workflows GA: Enables you to ship production-ready, long-running, multi-step actions in agents.
AutoRAG: Helps you integrate context-aware AI into your applications, in just a few clicks
agents.cloudflare.com: our new landing page for all things agents.

New MCP capabilities in Agents SDK

AI agents can now connect to and interact with external services through MCP (Model Context Protocol). We’ve updated the Agents SDK to allow you to build a remote MCP client into your AI agent, with all the components — authentication flows, tool discovery, and connection management — built-in for you.

This allows you to build agents that can:

Prompt the end user to grant access to a 3rd party service (MCP server).
Use tools from these external services, acting on behalf of the end user.
Call MCP servers from Workflows, scheduled tasks, or any part of your agent.
Connect to multiple MCP servers and automatically discover new tools or capabilities presented by the 3rd party service.

MCP (Model Context Protocol) — first introduced by Anthropic — is quickly becoming the standard way for AI agents to interact with external services, with providers like OpenAI, Cursor, and Copilot adopting the protocol.

We recently announced support for building remote MCP servers on Cloudflare, and added an McpAgent class to our Agents SDK that automatically handles the remote aspects of MCP: transport and authentication/authorization. Now, we’re excited to extend the same capabilities to agents acting as MCP clients.

Want to see it in action? Use the button below to deploy a fully remote MCP client that can be used to connect to remote MCP servers.

AI Agents can now act as remote MCP clients, with transport and auth included

AI agents need to connect to external services to access tools, data, and capabilities beyond their built-in knowledge. That means AI agents need to be able to act as remote MCP clients, so they can connect to remote MCP servers that are hosting these tools and capabilities.

We’ve added a new class, MCPClientManager, into the Agents SDK to give you all the tooling you need to allow your AI agent to make calls to external services via MCP. The MCPClientManager class automatically handles:

Transport: Connect to remote MCP servers over SSE and HTTP, with support for Streamable HTTP coming soon.
Connection management: The client tracks the state of all connections and automatically reconnects if a connection is lost.
Capability discovery: Automatically discovers all capabilities, tools, resources, and prompts presented by the MCP server.
Real-time updates: When a server's tools, resources, or prompts change, the client automatically receives notifications and updates its internal state.
Namespacing: When connecting to multiple MCP servers, all tools and resources are automatically namespaced to avoid conflicts.

Granting agents access to tools with built-in auth check for MCP Clients

We've integrated the complete OAuth authentication flow directly into the Agents SDK, so your AI agents can securely connect and authenticate to any remote MCP server without you having to build authentication flow from scratch.

This allows you to give users a secure way to log in and explicitly grant access to allow the agent to act on their behalf by automatically:

Supporting the OAuth 2.1 protocol.
Redirecting users to the service’s login page.
Generating the code challenge and exchanging an authorization code for an access token.
Using the access token to make authenticated requests to the MCP server.

Here is an example of an agent that can securely connect to MCP servers by initializing the client manager, adding the server, and handling the authentication callbacks:

Connecting to multiple MCP servers and discovering what capabilities they offer

You can use the Agents SDK to connect an MCP client to multiple MCP servers simultaneously. This is particularly useful when you want your agent to access and interact with tools and resources served by different service providers.

The MCPClientManager class maintains connections to multiple MCP servers through the mcpConnections object, a dictionary that maps unique server names to their respective MCPClientConnection instances.

When you register a new server connection using connect(), the manager:

Creates a new connection instance with server-specific authentication.
Initializes the connections and registers for server capability notifications.

Each connection manages its own authentication context, allowing one AI agent to authenticate to multiple servers simultaneously. In addition, MCPClientManager automatically handles namespacing to prevent collisions between tools with identical names from different servers.

For example, if both an “Image MCP Server” and “Code MCP Server” have a tool named “analyze”, they will both be independently callable without any naming conflicts.

Use Stytch, Auth0, and WorkOS to bring authentication & authorization to your MCP server

With MCP, users will have a new way of interacting with your application, no longer relying on the dashboard or API as the entrypoint. Instead, the service will now be accessed by AI agents that are acting on a user’s behalf. To ensure users and agents can connect to your service securely, you’ll need to extend your existing authentication and authorization system to support these agentic interactions, implementing login flows, permissions scopes, consent forms, and access enforcement for your MCP server.

We’re adding integrations with Stytch, Auth0, and WorkOS to make it easier for anyone building an MCP server to configure authentication & authorization for their MCP server.

You can leverage our MCP server integration with Stytch, Auth0, and WorkOS to:

Allow users to authenticate to your MCP server through email, social logins, SSO (single sign-on), and MFA (multi-factor authentication).
Define scopes and permissions that directly map to your MCP tools.
Present users with a consent page corresponding with the requested permissions.

Enforce the permissions so that agents can only invoke permitted tools.

Get started with the examples below by using the “Deploy to Cloudflare” button to deploy the demo MCP servers in your Cloudflare account. These demos include pre-configured authentication endpoints, consent flows, and permission models that you can tailor to fit your needs. Once you deploy the demo MCP servers, you can use the Workers AI playground, a browser-based remote MCP client, to test out the end-to-end user flow.

Stytch

Get started with a remote MCP server that uses Stytch to allow users to sign in with email, Google login or enterprise SSO and authorize their AI agent to view and manage their company’s OKRs on their behalf. Stytch will handle restricting the scopes granted to the AI agent based on the user’s role and permissions within their organization. When authorizing the MCP Client, each user will see a consent page that outlines the permissions that the agent is requesting that they are able to grant based on their role.

For more consumer use cases, deploy a remote MCP server for a To Do app that uses Stytch for authentication and MCP client authorization. Users can sign in with email and immediately access the To Do lists associated with their account, and grant access to any AI assistant to help them manage their tasks.

Regardless of use case, Stytch allows you to easily turn your application into an OAuth 2.0 identity provider and make your remote MCP server into a Relying Party so that it can easily inherit identity and permissions from your app. To learn more about how Stytch is enabling secure authentication to remote MCP servers, read their blog post.

“One of the challenges of realizing the promise of AI agents is enabling those agents to securely and reliably access data from other platforms. Stytch Connected Apps is purpose-built for these agentic use cases, making it simple to turn your app into an OAuth 2.0 identity provider to enable secure access to remote MCP servers. By combining Cloudflare Workers with Stytch Connected Apps, we're removing the barriers for developers, enabling them to rapidly transition from AI proofs-of-concept to secure, deployed implementations.” — Julianna Lamb, Co-Founder & CTO, Stytch.

Auth0

Get started with a remote MCP server that uses Auth0 to authenticate users through email, social logins, or enterprise SSO to interact with their todos and personal data through AI agents. The MCP server securely connects to API endpoints on behalf of users, showing exactly which resources the agent will be able to access once it gets consent from the user. In this implementation, access tokens are automatically refreshed during long running interactions.

To set it up, first deploy the protected API endpoint:

Then, deploy the MCP server that handles authentication through Auth0 and securely connects AI agents to your API endpoint.

"Cloudflare continues to empower developers building AI products with tools like AI Gateway, Vectorize, and Workers AI. The recent addition of Remote MCP servers further demonstrates that Cloudflare Workers and Durable Objects are a leading platform for deploying serverless AI. We’re very proud that Auth0 can help solve the authentication and authorization needs for these cutting-edge workloads." — Sandrino Di Mattia, Auth0 Sr. Director, Product Architecture.

WorkOS

Get started with a remote MCP server that uses WorkOS's AuthKit to authenticate users and manage the permissions granted to AI agents. In this example, the MCP server dynamically exposes tools based on the user's role and access rights. All authenticated users get access to the add tool, but only users who have been assigned the image_generation permission in WorkOS can grant the AI agent access to the image generation tool. This showcases how MCP servers can conditionally expose capabilities to AI agents based on the authenticated user's role and permission.

“MCP is becoming the standard for AI agent integration, but authentication and authorization are still major gaps for enterprise adoption. WorkOS Connect enables any application to become an OAuth 2.0 authorization server, allowing agents and MCP clients to securely obtain tokens for fine-grained permission authorization and resource access. With Cloudflare Workers, developers can rapidly deploy remote MCP servers with built-in OAuth and enterprise-grade access control. Together, WorkOS and Cloudflare make it easy to ship secure, enterprise-ready agent infrastructure.” — Michael Grinich, CEO of WorkOS.

Hibernate-able WebSockets: put AI agents to sleep when they’re not in use

Starting today, a new improvement is landing in the McpAgent class: support for the WebSockets Hibernation API that allows your MCP server to go to sleep when it’s not receiving requests and instantly wake up when it’s needed. That means that you now only pay for compute when your agent is actually working.

We recently introduced the McpAgent class, which allows developers to build remote MCP servers on Cloudflare by using Durable Objects to maintain stateful connections for every client session. We decided to build McpAgent to be stateful from the start, allowing developers to build servers that can remember context, user preferences, and conversation history. But maintaining client connections means that the session can remain active for a long time, even when it’s not being used.

MCP Agents are hibernate-able by default

You don’t need to change your code to take advantage of hibernation. With our latest SDK update, all McpAgent instances automatically include hibernation support, allowing your stateful MCP servers to sleep during inactive periods and wake up with their state preserved when needed.

How it works

When a request comes in on the Server-Sent Events endpoint, /sse, the Worker initializes a WebSocket connection to the appropriate Durable Object for the session and returns an SSE stream back to the client. All responses flow over this stream.

The implementation leverages the WebSocket Hibernation API within Durable Objects. When periods of inactivity occur, the Durable Object can be evicted from memory while keeping the WebSocket connection open. If the WebSocket later receives a message, the runtime recreates the Durable Object and delivers the message to the appropriate handler.

Durable Objects on free tier

To help you build AI agents on Cloudflare, we’re making Durable Objects available on the free tier, so you can start with zero commitment. With Agents SDK, your AI agents deploy to Cloudflare running on Durable Objects.

Durable Objects offer compute alongside durable storage, that when combined with Workers, unlock stateful, serverless applications. Each Durable Object is a stateful coordinator for handling client real-time interactions, making requests to external services like LLMs, and creating agentic “memory” through state persistence in zero-latency SQLite storage — all tasks required in an AI agent. Durable Objects scale out to millions of agents effortlessly, with each agent created near the user interacting with their agent for fast performance, all managed by Cloudflare.

Zero-latency SQLite storage in Durable Objects was introduced in public beta September 2024 for Birthday Week. Since then, we’ve focused on missing features and robustness compared to pre-existing key-value storage in Durable Objects. We are excited to make SQLite storage generally available, with a 10 GB SQLite database per Durable Object, and recommend SQLite storage for all new Durable Object classes. Durable Objects free tier can only access SQLite storage.

Cloudflare’s free tier allows you to build real-world applications. On the free plan, every Worker request can call a Durable Object. For usage-based pricing, Durable Objects incur compute and storage usage with the following free tier limits.

Find us at agents.cloudflare.com

We realize this is a lot of information to take in, but don’t worry. Whether you’re new to agents as a whole, or looking to learn more about how Cloudflare can help you build agents, today we launched a new site to help get you started — agents.cloudflare.com.

Let us know what you build!

Build and deploy Remote Model Context Protocol (MCP) servers to Cloudflare

Brendan Irvine-Broque — Tue, 25 Mar 2025 13:59:00 GMT

It feels like almost everyone building AI applications and agents is talking about the Model Context Protocol (MCP), as well as building MCP servers that you install and run locally on your own computer.

You can now build and deploy remote MCP servers to Cloudflare. We’ve added four things to Cloudflare that handle the hard parts of building remote MCP servers for you:

workers-oauth-provider — an OAuth Provider that makes authorization easy
McpAgent — a class built into the Cloudflare Agents SDK that handles remote transport
mcp-remote — an adapter that lets MCP clients that otherwise only support local connections work with remote MCP servers
AI playground as a remote MCP client — a chat interface that allows you to connect to remote MCP servers, with the authentication check included

The button below, or the developer docs, will get you up and running in production with this example MCP server in less than two minutes:

Unlike the local MCP servers you may have previously used, remote MCP servers are accessible on the Internet. People simply sign in and grant permissions to MCP clients using familiar authorization flows. We think this is going to be a massive deal — connecting coding agents to MCP servers has blown developers’ minds over the past few months, and remote MCP servers have the same potential to open up similar new ways of working with LLMs and agents to a much wider audience, including more everyday consumer use cases.

From local to remote — bringing MCP to the masses

MCP is quickly becoming the common protocol that enables LLMs to go beyond inference and RAG, and take actions that require access beyond the AI application itself (like sending an email, deploying a code change, publishing blog posts, you name it). It enables AI agents (MCP clients) to access tools and resources from external services (MCP servers).

To date, MCP has been limited to running locally on your own machine — if you want to access a tool on the web using MCP, it’s up to you to set up the server locally. You haven’t been able to use MCP from web-based interfaces or mobile apps, and there hasn’t been a way to let people authenticate and grant the MCP client permission. Effectively, MCP servers haven’t yet been brought online.

Support for remote MCP connections changes this. It creates the opportunity to reach a wider audience of Internet users who aren’t going to install and run MCP servers locally for use with desktop apps. Remote MCP support is like the transition from desktop software to web-based software. People expect to continue tasks across devices and to login and have things just work. Local MCP is great for developers, but remote MCP connections are the missing piece to reach everyone on the Internet.

Making authentication and authorization just work with MCP

Beyond just changing the transport layer — from stdio to streamable HTTP — when you build a remote MCP server that uses information from the end user’s account, you need authentication and authorization. You need a way to allow users to login and prove who they are (authentication) and a way for users to control what the AI agent will be able to access when using a service (authorization).

MCP does this with OAuth, which has been the standard protocol that allows users to grant applications to access their information or services, without sharing passwords. Here, the MCP Server itself acts as the OAuth Provider. However, OAuth with MCP is hard to implement yourself, so when you build MCP servers on Cloudflare we provide it for you.

workers-oauth-provider — an OAuth 2.1 Provider library for Cloudflare Workers

When you deploy an MCP Server to Cloudflare, your Worker acts as an OAuth Provider, using workers-oauth-provider, a new TypeScript library that wraps your Worker’s code, adding authorization to API endpoints, including (but not limited to) MCP server API endpoints.

Your MCP server will receive the already-authenticated user details as a parameter. You don’t need to perform any checks of your own, or directly manage tokens. You can still fully control how you authenticate users: from what UI they see when they log in, to which provider they use to log in. You can choose to bring your own third-party authentication and authorization providers like Google or GitHub, or integrate with your own.

The complete MCP OAuth flow looks like this:

Here, your MCP server acts as both an OAuth client to your upstream service, and as an OAuth server (also referred to as an OAuth “provider”) to MCP clients. You can use any upstream authentication flow you want, but workers-oauth-provider guarantees that your MCP server is spec-compliant and able to work with the full range of client apps & websites. This includes support for Dynamic Client Registration (RFC 7591) and Authorization Server Metadata (RFC 8414).

A simple, pluggable interface for OAuth

When you build an MCP server with Cloudflare Workers, you provide an instance of the OAuth Provider paths to your authorization, token, and client registration endpoints, along with handlers for your MCP Server, and for auth:

This abstraction lets you easily plug in your own authentication. Take a look at this example that uses GitHub as the identity provider for an MCP server, in less than 100 lines of code, by implementing /callback and /authorize routes.

Why do MCP servers issue their own tokens?

You may have noticed in the authorization diagram above, and in the authorization section of the MCP spec, that the MCP server issues its own token to the MCP client.

Instead of passing the token it receives from the upstream provider directly to the MCP client, your Worker stores an encrypted access token in Workers KV. It then issues its own token to the client. As shown in the GitHub example above, this is handled on your behalf by the workers-oauth-provider — your code never directly handles writing this token, preventing mistakes. You can see this in the following code snippet from the GitHub example above:

On the surface, this indirection might sound more complicated. Why does it work this way?

By issuing its own token, MCP Servers can restrict access and enforce more granular controls than the upstream provider. If a token you issue to an MCP client is compromised, the attacker only gets the limited permissions you've explicitly granted through your MCP tools, not the full access of the original token.

Let’s say your MCP server requests that the user authorize permission to read emails from their Gmail account, using the gmail.readonly scope. The tool that the MCP server exposes is more narrow, and allows reading travel booking notifications from a limited set of senders, to handle a question like “What’s the check-out time for my hotel room tomorrow?” You can enforce this constraint in your MCP server, and if the token you issue to the MCP client is compromised, because the token is to your MCP server — and not the raw token to the upstream provider (Google) — an attacker cannot use it to read arbitrary emails. They can only call the tools your MCP server provides. OWASP calls out “Excessive Agency” as one of the top risk factors for building AI applications, and by issuing its own token to the client and enforcing constraints, your MCP server can limit tools access to only what the client needs.

Or building off the earlier GitHub example, you can enforce that only a specific user is allowed to access a particular tool. In the example below, only users that are part of an allowlist can see or call the generateImage tool, that uses Workers AI to generate an image based on a prompt:

Introducing McpAgent: remote transport support that works today, and will work with the revision to the MCP spec

The next step to opening up MCP beyond your local machine is to open up a remote transport layer for communication. MCP servers you run on your local machine just communicate over stdio, but for an MCP server to be callable over the Internet, it must implement remote transport.

The McpAgent class we introduced today as part of our Agents SDK handles this for you, using Durable Objects behind the scenes to hold a persistent connection open, so that the MCP client can send server-sent events (SSE) to your MCP server. You don’t have to write code to deal with transport or serialization yourself. A minimal MCP server in 15 lines of code can look like this:

After much discussion, remote transport in the MCP spec is changing, with Streamable HTTP replacing HTTP+SSE This allows for stateless, pure HTTP connections to MCP servers, with an option to upgrade to SSE, and removes the need for the MCP client to send messages to a separate endpoint than the one it first connects to. The McpAgent class will change with it and just work with streamable HTTP, so that you don’t have to start over to support the revision to how transport works.

This applies to future iterations of transport as well. Today, the vast majority of MCP servers only expose tools, which are simple remote procedure call (RPC) methods that can be provided by a stateless transport. But more complex human-in-the-loop and agent-to-agent interactions will need prompts and sampling. We expect these types of chatty, two-way interactions will need to be real-time, which will be challenging to do well without a bidirectional transport layer. When that time comes, Cloudflare, the Agents SDK, and Durable Objects all natively support WebSockets, which enable full-duplex, bidirectional real-time communication.

Stateful, agentic MCP servers

When you build MCP servers on Cloudflare, each MCP client session is backed by a Durable Object, via the Agents SDK. This means each session can manage and persist its own state, backed by its own SQL database.

This opens the door to building stateful MCP servers. Rather than just acting as a stateless layer between a client app and an external API, MCP servers on Cloudflare can themselves be stateful applications — games, a shopping cart plus checkout flow, a persistent knowledge graph, or anything else you can dream up. When you build on Cloudflare, MCP servers can be much more than a layer in front of your REST API.

To understand the basics of how this works, let’s look at a minimal example that increments a counter:

For a given session, the MCP server above will remember the state of the counter across tool calls.

From within an MCP server, you can use Cloudflare’s whole developer platform, and have your MCP server spin up its own web browser, trigger a Workflow, call AI models, and more. We’re excited to see the MCP ecosystem evolve into more advanced use cases.

Connect to remote MCP servers from MCP clients that today only support local MCP

Cloudflare is supporting remote MCP early — before the most prominent MCP client applications support remote, authenticated MCP, and before other platforms support remote MCP. We’re doing this to give you a head start building for where MCP is headed.

But if you build a remote MCP server today, this presents a challenge — how can people start using your MCP server if there aren’t MCP clients that support remote MCP?

We have two new tools that allow you to test your remote MCP server and simulate how users will interact with it in the future:

We updated the Workers AI Playground to be a fully remote MCP client that allows you to connect to any remote MCP server with built-in authentication support. This online chat interface lets you immediately test your remote MCP servers without having to install anything on your device. Instead, just enter the remote MCP server’s URL (e.g. https://remote-server.example.com/sse) and click Connect.

Once you click Connect, you’ll go through the authentication flow (if you set one up) and after, you will be able to interact with the MCP server tools directly from the chat interface.

If you prefer to use a client like Claude Desktop or Cursor that already supports MCP but doesn’t yet handle remote connections with authentication, you can use mcp-remote. mcp-remote is an adapter that lets MCP clients that otherwise only support local connections to work with remote MCP servers. This gives you and your users the ability to preview what interactions with your remote MCP server will be like from the tools you’re already using today, without having to wait for the client to support remote MCP natively.

We’ve published a guide on how to use mcp-remote with popular MCP clients including Claude Desktop, Cursor, and Windsurf. In Claude Desktop, you add the following to your configuration file:

1800-mcp@cloudflare.com — start building remote MCP servers today

Remote Model Context Protocol (MCP) is coming! When client apps support remote MCP servers, the audience of people who can use them opens up from just us, developers, to the rest of the population — who may never even know what MCP is or stands for.

Building a remote MCP server is the way to bring your service into the AI assistants and tools that millions of people use. We’re excited to see many of the biggest companies on the Internet are busy building MCP servers right now, and we are curious about the businesses that pop-up in an agent-first, MCP-native way.

Hi Claude, build an MCP server on Cloudflare Workers

Dina Kozlov — Fri, 20 Dec 2024 14:50:00 GMT

In late November 2024, Anthropic announced a new way to interact with AI, called Model Context Protocol (MCP). Today, we’re excited to show you how to use MCP in combination with Cloudflare to extend the capabilities of Claude to build applications, generate images and more. You’ll learn how to build an MCP server on Cloudflare to make any service accessible through an AI assistant like Claude with just a few lines of code using Cloudflare Workers.

A quick primer on the Model Context Protocol (MCP)

MCP is an open standard that provides a universal way for LLMs to interact with services and applications. As the introduction on the MCP website puts it,

“Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect your devices to various peripherals and accessories, MCP provides a standardized way to connect AI models to different data sources and tools.”

From an architectural perspective, MCP is comprised of several components:

MCP hosts: Programs or tools (like Claude) where AI models operate and interact with different services
MCP clients: Client within an AI assistant that initiates requests and communicates with MCP servers to perform tasks or access resources
MCP servers: Lightweight programs that each expose the capabilities of a service
Local data sources: Files, databases, and services on your computer that MCP servers can securely access
Remote services: External Internet-connected systems that MCP servers can connect to through APIs

Imagine you ask Claude to send a message in a Slack channel. Before Claude can do this, Slack must communicate which tools are available. It does this by defining tools — such as “list channels”, “post messages”, and “reply to thread” — in the MCP server. Once the MCP client knows what tools it should invoke, it can complete the task. All you have to do is tell it what you need, and it will get it done.

Allowing AI to not just generate, but deploy applications for you

What makes MCP so powerful? As a quick example, by combining it with a platform like Cloudflare Workers, it allows Claude users to deploy a Cloudflare Worker in just one sentence, resulting in a site like this:

But that’s just one example. Today, we’re excited to show you how you can build and deploy your own MCP server to allow your users to interact with your application directly from an LLM like Claude, and how you can do that just by writing a Cloudflare Worker.

Simplifying your MCP Server deployment with workers-mcp

The new workers-mcp tooling handles the translation between your code and the MCP standard, so that you don’t have to do the maintenance work to get it set up.

Once you create your Worker and install the MCP tooling, you’ll get a worker-mcp template set up for you. This boilerplate removes the overhead of configuring the MCP server yourself:

Let’s unpack what’s happening here. This provides a direct link to MCP. The ProxyToSelf logic ensures that your Worker is wired up to respond as an MCP server, without any complex routing or schema definitions.

It also provides tool definition with JSDoc. You’ll notice that the `sayHello` method is annotated with JSDoc comments describing what it does, what arguments it takes, and what it returns. These comments aren’t just for human readers, but they’re also used to generate documentation that your AI assistant (Claude) can understand.

Adding image generation to Claude

When you build an MCP server using Workers, adding custom functionality to an LLM is easy. Instead of setting up the server infrastructure, defining request schemas, all you have to do is write the code. Above, all we did was generate a “hello world”, but now let’s power up Claude to generate an image, using Workers AI:

Once you update the code and redeploy the Worker, Claude will now be able to use the new image generation tool. All you have to say is: “Hey! Can you create an image of a lava lamp wall that lives in San Francisco?”

If you’re looking for some inspiration, here are a few examples of what you can build with MCP and Workers:

Let Claude send follow-up emails on your behalf using Email Routing
Ask Claude to capture and share website previews via Browser Automation
Store and manage sessions, user data, or other persistent information with Durable Objects
Query and update data from your D1 database
…or call any of your existing Workers directly!

Why use Workers for building your MCP server?

To build out an MCP server without access to Cloudflare’s tooling, you would have to: initialize an instance of the server, define your APIs by creating explicit schemas for every interaction, handle request routing, ensure that the responses are formatted correctly, write handlers for every action, configure how the server will communicate, and more… As shown above, we do all of this for you.

For reference, an implementation may look something like this:

While this works, it requires quite a bit of code just to get started. Not only do you need to be familiar with the MCP protocol, but you need to complete a fair amount of set up work (e.g. defining schemas) for every action. Doing it through Workers removes all these barriers, allowing you to spin up an MCP server without the complexity.

We’re always looking for ways to simplify developer workflows, and we’re excited about this new standard to open up more possibilities for interacting with LLMs, and building agents.

If you’re interested in setting this up, check out this tutorial which walks you through these examples. We’re excited to see what you build. Be sure to share your MCP server creations with us on Discord, X, or Bluesky!