The Cloudflare Blog

Agents can now create Cloudflare accounts, buy domains, and deploy

Sid Chatterjee — Thu, 30 Apr 2026 13:00:00 GMT

Coding agents are great at building software. But to deploy to production they need three things from the cloud they want to host their app — an account, a way to pay, and an API token. Until now these have been tasks that humans handle directly. Increasingly, agents handle them on the user’s behalf. The agent needs to perform all the tasks a human customer can. They’re given higher-order problems to solve and choose to use Cloudflare and call Cloudflare APIs.

Starting today, agents can provision Cloudflare on behalf of their users. They can create a Cloudflare account, start a paid subscription, register a domain, and get back an API token to deploy code right away. Humans can be in the loop to grant permission and must accept Cloudflare's terms of service, but no human steps are otherwise required from start to finish. There’s no need to go to the dashboard, copy and paste API tokens, or enter credit card details. Without any extra setup, agents have everything they need to deploy a new production application in one shot. And with Cloudflare’s Code Mode MCP server and Agent Skills, they’re even better at it.

This all works via a new protocol that we’ve co-designed with Stripe as part of the launch of Stripe Projects.

We’re excited to launch this new partnership with Stripe, and also to offer $100,000 in Cloudflare credits to all new startups who incorporate using Stripe Atlas. But this new protocol also makes it possible for any platform with signed-in users to integrate with Cloudflare in the same way Stripe does, with zero friction for the end user.

How it works: zero to production without any setup or manual steps

Install the Stripe CLI with the Stripe Projects plugin, login to Stripe, and then start a new project:

stripe projects init

Then prompt your agent to build something new and deploy it to a new domain. You can watch a condensed two-minute video of this entire flow below:

If the email you’re logged into Stripe with already has a Cloudflare account, you’ll be prompted with a typical OAuth flow to grant the agent access. If there is no existing Cloudflare account for the email you’re logged in with, Cloudflare will provision an account automatically for you and your agent:

You will see the agent build and deploy a site to a new Cloudflare account, and then use the Stripe Projects CLI to register the domain:

The agent will prompt for input and approval when necessary. For example, if your Stripe account doesn’t yet have a linked payment method, the agent will prompt you to add one:

At the end, the agent has deployed to production, and the app runs on the newly registered domain:

The agent has gone from literal zero, no Cloudflare account at all, without any preconfigured Agent Skills or MCP server, to having:

Provisioned a new Cloudflare account
Obtained an API token
Purchased a domain
Deployed an app to production

But wait — how did the agent discover that it could do all of this? How did it know what services it could provision, and how to purchase a domain? How did it gain the context it needed to understand how to deploy to Cloudflare? Let’s dig in.

How the protocol and integration works

There are three components to the interaction between the agent, Stripe, and Cloudflare shown above:

Discovery — the agent can call a command to query the catalog of available services.
Authorization — the platform attests to the identity of the user, allowing providers to provision accounts or link existing ones, and securely issue credentials back to the agent.
Payment — the platform provides a payment token that providers can use to bill the customer, allowing the agent to start subscriptions, make purchases and be billed on a usage basis.

These build on prior art and existing standards like OAuth, OIDC and payment tokenization — but are used together to remove many steps that might otherwise require a human in the loop.

Discovery: how agents find services they can provision themselves

In the agent session above, before the agent ran the CLI command stripe projects add cloudflare/registrar:domain, it first had to discover the Cloudflare Registrar service. It did this by calling the stripe projects catalog command, which returns available services:

The full set of Cloudflare products and services from other providers is long and growing — arguably overwhelming to humans. But for agents, this catalog of services is exactly the context they need. The agent chooses services to use from this catalog based on what the user has asked them to do and the user’s preferences — but the user needs no prior knowledge of what services are offered by which providers, and does not need to provide any input. Providers like Cloudflare make this catalog available via a simple REST API that returns JSON, and that gives agents everything they need.

Authorization: instant account creation for new users

When the agent chooses a service and provisions it (ex: stripe projects add cloudflare/registrar:domain), it provisions the resource within a Cloudflare account. But how is it able to create one on demand, without sending a human to a signup page?

Remember how at the start, the user signed in to their Stripe account? Stripe acts as the identity provider, attesting to the user’s identity. Cloudflare automatically provisions a new account for the user if no account already exists, and returns credentials back to the Stripe Projects CLI, which are securely stored, but available to the agent to use to make authenticated requests to Cloudflare. This means if someone is brand new to Cloudflare or other services, they can start building right away with their agent, without extra steps.

If the user already has a Cloudflare account, they’re sent through a standard OAuth flow to grant access to the Stripe Projects CLI, allowing them to provision resources on their existing Cloudflare account.

Payment: give your agent a budget it can spend, without giving it your credit card info

You might rightly worry, “What if my agent goes a bit overboard and starts buying dozens of domains? Will I end up on the hook for a massive bill? Can I really trust my agent with my credit card?”

The protocol accounts for this in two ways. When an agent provisions a paid service, Stripe includes a payment token in the request to the Provider (Cloudflare). Raw payment details like credit card numbers aren’t ever shared with the agent. Stripe then sets a default limit of $100.00 USD/month as the maximum the agent can spend on any one provider. When you’re ready to raise this limit, you can then set Budget Alerts on your Cloudflare account.

Any platform with signed-in users can integrate with Cloudflare in the same way Stripe does

Any platform with signed-in users can act as the “Orchestrator”, playing the same role Stripe does with Stripe Projects, and integrate with Cloudflare.

Let’s say your product is a coding agent. You’d love for people to be able to take what they’ve built and get it deployed to production, using Cloudflare and other services. But the last thing you want is to send people down a maze of authorization flows and decision trees of where and how to deploy it. You just want to let people ship.

Your platform acts as the Orchestrator, with the already signed-in user. When your user needs a domain, a storage bucket, a sandbox to give their agent, or anything else, you make one API call to Cloudflare to provision a new Cloudflare account to them, and get back a token to make authenticated requests on their behalf.

Or let’s say you want Cloudflare customers to be able to easily provision your service, similar to how Cloudflare is partnering with Planetscale to make it possible to create Planetscale Postgres databases directly from Cloudflare. We started working with Planetscale on this well before this new protocol got off the ground, but the flow here is quite similar. Cloudflare acts as the Orchestrator, letting you connect to your PlanetScale account, create databases, and use the user’s existing payment method for billing.

This new protocol starts to standardize the types of cross-product integrations that many platforms have been doing for years, often in ways that were one off or bespoke to a particular platform. Without a standard, each integration required engineering work that often couldn’t be leveraged for future integrations. Similar to how the OAuth standard made it possible to delegate access to your account to other platforms, the protocol uses OAuth and extends further into payments and account creation, doing so in a way that treats agents as a first-class concern.

We’re excited to continue evolving the standard, and to work with Stripe on sharing a more official specification soon. We’re also excited to integrate with more platforms — email us at agenticpartnerships@cloudflare.com, and tell us how you want your platform to integrate with Cloudflare.

Give your agent the power to provision and pay

Stripe Projects is in open beta, and you can get started even if you don’t yet have a Cloudflare account. Just install the Stripe CLI, log in to Stripe, and then start a new project:

stripe projects init

Prompt your agent to build something new on Cloudflare, and show us what you’ve built!

Astro is joining Cloudflare

Fred Schott — Fri, 16 Jan 2026 14:00:00 GMT

The Astro Technology Company, creators of the Astro web framework, is joining Cloudflare.

Astro is the web framework for building fast, content-driven websites. Over the past few years, we’ve seen an incredibly diverse range of developers and companies use Astro to build for the web. This ranges from established brands like Porsche and IKEA, to fast-growing AI companies like Opencode and OpenAI. Platforms that are built on Cloudflare, like Webflow Cloud and Wix Vibe, have chosen Astro to power the websites their customers build and deploy to their own platforms. At Cloudflare, we use Astro, too — for our developer docs, website, landing pages, blog, and more. Astro is used almost everywhere there is content on the Internet.

By joining forces with the Astro team, we are doubling down on making Astro the best framework for content-driven websites for many years to come. The best version of Astro — Astro 6 — is just around the corner, bringing a redesigned development server powered by Vite. The first public beta release of Astro 6 is now available, with GA coming in the weeks ahead.

We are excited to share this news and even more thrilled for what it means for developers building with Astro. If you haven’t yet tried Astro — give it a spin and run npm create astro@latest.

What this means for Astro

Astro will remain open source, MIT-licensed, and open to contributions, with a public roadmap and open governance. All full-time employees of The Astro Technology Company are now employees of Cloudflare, and will continue to work on Astro. We’re committed to Astro’s long-term success and eager to keep building.

Astro wouldn’t be what it is today without an incredibly strong community of open-source contributors. Cloudflare is also committed to continuing to support open-source contributions, via the Astro Ecosystem Fund, alongside industry partners including Webflow, Netlify, Wix, Sentry, Stainless and many more.

From day one, Astro has been a bet on the web and portability: Astro is built to run anywhere, across clouds and platforms. Nothing changes about that. You can deploy Astro to any platform or cloud, and we’re committed to supporting Astro developers everywhere.

There are many web frameworks out there — so why are developers choosing Astro?

Astro has been growing rapidly:

Why? Many web frameworks have come and gone trying to be everything to everyone, aiming to serve the needs of both content-driven websites and web applications.

The key to Astro’s success: Instead of trying to serve every use case, Astro has stayed focused on five design principles. Astro is…

Content-driven: Astro was designed to showcase your content.
Server-first: Websites run faster when they render HTML on the server.
Fast by default: It should be impossible to build a slow website in Astro.
Easy to use: You don’t need to be an expert to build something with Astro.
Developer-focused: You should have the resources you need to be successful.

Astro’s Islands Architecture is a core part of what makes all of this possible. The majority of each page can be fast, static HTML — fast and simple to build by default, oriented around rendering content. And when you need it, you can render a specific part of a page as a client island, using any client UI framework. You can even mix and match multiple frameworks on the same page, whether that’s React.js, Vue, Svelte, Solid, or anything else:

Bringing back the joy in building websites

The more Astro and Cloudflare started talking, the clearer it became how much we have in common. Cloudflare’s mission is to help build a better Internet — and part of that is to help build a faster Internet. Almost all of us grew up building websites, and we want a world where people have fun building things on the Internet, where anyone can publish to a site that is truly their own.

When Astro first launched in 2021, it had become painful to build great websites — it felt like a fight with build tools and frameworks. It sounds strange to say it, with the coding agents and powerful LLMs of 2026, but in 2021 it was very hard to build an excellent and fast website without being a domain expert in JavaScript build tooling. So much has gotten better, both because of Astro and in the broader frontend ecosystem, that we take this almost for granted today.

The Astro project has spent the past five years working to simplify web development. So as LLMs, then vibe coding, and now true coding agents have come along and made it possible for truly anyone to build — Astro provided a foundation that was simple and fast by default. We’ve all seen how much better and faster agents get when building off the right foundation, in a well-structured codebase. More and more, we’ve seen both builders and platforms choose Astro as that foundation.

We’ve seen this most clearly through the platforms that both Cloudflare and Astro serve, that extend Cloudflare to their own customers in creative ways using Cloudflare for Platforms, and have chosen Astro as the framework that their customers build on.

When you deploy to Webflow Cloud, your Astro site just works and is deployed across Cloudflare’s network. When you start a new project with Wix Vibe, behind the scenes you’re creating an Astro site, running on Cloudflare. And when you generate a developer docs site using Stainless, that generates an Astro project, running on Cloudflare, powered by Starlight — a framework built on Astro.

Each of these platforms is built for a different audience. But what they have in common — beyond their use of Cloudflare and Astro — is they make it fun to create and publish content to the Internet. In a world where everyone can be both a builder and content creator, we think there are still so many more platforms to build and people to reach.

Astro 6 — new local dev server, powered by Vite

Astro 6 is coming, and the first open beta release is now available. To be one of the first to try it out, run:

npm create astro@latest -- --ref next

Or to upgrade your existing Astro app, run:

npx @astrojs/upgrade beta

Astro 6 brings a brand new development server, built on the Vite Environments API, that runs your code locally using the same runtime that you deploy to. This means that when you run astro dev with the Cloudflare Vite plugin, your code runs in workerd, the open-source Cloudflare Workers runtime, and can use Durable Objects, D1, KV, Agents and more. This isn’t just a Cloudflare feature: Any JavaScript runtime with a plugin that uses the Vite Environments API can benefit from this new support, and ensure local dev runs in the same environment, with the same runtime APIs as production.

Live Content Collections in Astro are also stable in Astro 6 and out of beta. These content collections let you update data in real time, without requiring a rebuild of your site. This makes it easy to bring in content that changes often, such as the current inventory in a storefront, while still benefitting from the built-in validation and caching that come with Astro’s existing support for content collections.

There’s more to Astro 6, including Astro’s most upvoted feature request — first-class support for Content Security Policy (CSP) — as well as simpler APIs, an upgrade to Zod 4, and more.

Doubling down on Astro

We're thrilled to welcome the Astro team to Cloudflare. We’re excited to keep building, keep shipping, and keep making Astro the best way to build content-driven sites. We’re already thinking about what comes next beyond V6, and we’d love to hear from you.

To keep up with the latest, follow the Astro blog and join the Astro Discord. Tell us what you’re building!

Cloudflare's developer platform keeps getting better, faster, and more powerful. Here's everything that's new.

Brendan Irvine-Broque — Thu, 25 Sep 2025 14:00:00 GMT

When you build on Cloudflare, we consider it our job to do the heavy lifting for you. That’s been true since we introduced Cloudflare Workers in 2017, when we first provided a runtime for you where you could just focus on building.

That commitment is still true today, and many of today’s announcements are focused on just that — removing friction where possible to free you up to build something great.

There are only so many blog posts we can write (and that you can read)! We have been busy on a much longer list of new improvements, and many of them we’ve been rolling out consistently over the course of the year. Today’s announcement breaks down all the new capabilities in detail, in one single post. The features being released today include:

Use more APIs from Node.js — including node:fs and node:https
Use models from different providers in AI Search (formerly AutoRAG)
Deploy larger container instances and more concurrent instances to our Containers platform
Run 30 concurrent headless web browsers (previously 10), via the Browser Rendering API
Use the Playwright browser automation library with the Browser Rendering API — now fully supported and GA
Use 4 vCPUs (prev 2) and 20GB of disk (prev 8GB) with Workers Builds — now GA
Connect to production services and resources from local development with Remote Bindings — now GA
R2 Infrequent Access GA - lower-cost storage class for backups, logs, and long-tail content
Resize, clip and reformat video files on-demand with Media Transformations — now GA

Alongside that, we’re constantly adding new building blocks, to make sure you have all the tools you need to build what you set out to. Those launches (that also went out today, but require a bit more explanation) include:

Connect to Postgres databases running on Planetscale
Send transactional emails via the new Cloudflare Email Service
Run distributed SQL queries with the new Cloudflare Data Platform
Deploy your own AI vibe coding platform to Cloudflare with VibeSDK

AI Search (formerly AutoRAG) — now with More Models To Choose From

AutoRAG is now AI Search! The new name marks a new and bigger mission: to make world-class search infrastructure available to every developer and business. AI Search is no longer just about retrieval for LLM apps: it’s about giving you a fast, flexible index for your content that is ready to power any AI experience. With recent additions like NLWeb support, we are expanding beyond simple retrieval to provide a foundation for top quality search experiences that are open and built for the future of the web.

With AI Search you can now use models from different providers like OpenAI and Anthropic. Last month during AI Week we announced BYO Provider Keys for AI Gateway. That capability now extends to AI Search. By attaching your keys to the AI Gateway linked to your AI Search instance, you can use many more models for both embedding and inference.

Once configured, your AI Search instance will be able to reference models available through your AI Gateway when making a /ai-search request:

export default {
  async fetch(request, env) {
    
    // Query your AI Search instance with a natural language question to an OpenAI model
    const result = await env.AI.autorag("my-ai-search").aiSearch({
      query: "What's new for Cloudflare Birthday Week?",
      model: "openai/gpt-5"
    });

    // Return only the generated answer as plain text
    return new Response(result.response, {
      headers: { "Content-Type": "text/plain" },
    });
  },
};

In the coming weeks we will also roll out updates to align the APIs with the new name. The existing APIs will continue to be supported for the time being. Stay tuned to the AI Search Changelog and Discord for more updates!

Connect to production services and resources from local development with Remote Bindings — now GA

Remote bindings for local development are generally available, supported in Wrangler v4.37.0, the Cloudflare Vite plugin, and the @cloudflare/vitest-pool-workers package. Remote bindings are bindings that are configured to connect to a deployed resource on your Cloudflare account instead of the locally simulated resource.

For example, here’s how you can instruct Wrangler or Vite to send all requests to env.MY_BUCKET to hit the real, deployed R2 bucket instead of a locally simulated one:

{
  "name": "my-worker",
  "compatibility_date": "2025-09-25",

  "r2_buckets": [
    {
      "bucket_name": "my-bucket",
      "binding": "MY_BUCKET",
      "remote": true
    },
  ],
}

With the above configuration, all requests to env.MY_BUCKET will be proxied to the remote resource, but the Worker code will still execute locally. This means you get all the benefits of local development like faster execution times – without having to seed local databases with data.

You can pair remote bindings with environments, so that you can use staging data during local development and leave production data untouched.

For example, here’s how you could point Wrangler or Vite to send all requests to env.MY_BUCKET to staging-storage-bucket when you run wrangler dev --env staging (CLOUDFLARE_ENV=staging vite dev if using Vite).

{
  "name": "my-worker",
  "compatibility_date": "2025-09-25",

"env": {
    "staging": {
      "r2_buckets": [
        {
          "binding": "MY_BUCKET",
          "bucket_name": "staging-storage-bucket",
          "remote": true
        }
      ]
    },
    "production": {
      "r2_buckets": [
        {
          "binding": "MY_BUCKET",
          "bucket_name": "production-storage-bucket" 
        }
      ]
    }
  }
}

More Node.js APIs and packages “just work” on Workers

Over the past year, we have been hard at work to make Workers more compatible with Node.js packages and APIs.

Several weeks ago, we shared how node:http and node:https APIs are now supported on Workers. This means that you can run backend Express and Koa.js work with only a few additional lines of code:

import { httpServerHandler } from 'cloudflare:node';
import express from 'express';

const app = express();

app.get('/', (req, res) => {
  res.json({ message: 'Express.js running on Cloudflare Workers!' });
});

app.listen(3000);
export default httpServerHandler({ port: 3000 });

And there’s much, much more. You can now:

Read and write temporary files in Workers, using node:fs
Do DNS looking using 1.1.1.1 with node:dns
Use node:net and node:tls for first class Socket support
Use common hashing libraries with node:crypto
Access environment variables in a Node-like fashion on process.env

Read our full recap of the last year’s Node.js-related changes for all the details.

With these changes, Workers become even more powerful and easier to adopt, regardless of where you’re coming from. The APIs that you are familiar with are there, and more packages you need will just work.

Larger Container instances, more concurrent instances

Cloudflare Containers now has higher limits on concurrent instances and an upcoming new, larger instance type.

Previously you could run 50 instances of the dev instance type or 25 instances of the basic instance type concurrently. Now you can run concurrent containers with up to 400 GiB of memory, 100 vCPUs, and 2 TB of disk. This allows you to run up to 1000 dev instances or 400 basic instances concurrently. Enterprise customers can push far beyond these limits — contact us if you need more. If you are using Containers to power your app and it goes viral, you’ll have the ability to scale on Cloudflare.

Cloudflare Containers also now has a new instance type coming soon — standard-2 which includes 8 GiB of memory, 1 vCPU, and 12 GB of disk. This new instance type is an ideal default for workloads that need more resources, from AI Sandboxes to data processing jobs.

Workers Builds provides more disk and CPU — and is now GA

Last Birthday Week, we announced the launch of our integrated CI/CD pipeline, Workers Builds, in open beta. We also gave you a detailed look into how we built this system on our Workers platform using Containers, Durable Objects, Hyperdrive, Workers Logs, and Smart Placement.

This year, we are excited to announce that Workers Builds is now Generally Available. Here’s what’s new:

Increased disk space for all plans: We've increased the disk size from 8 GB to 20 GB for both free and paid plans, giving you more space for your projects and dependencies
More compute for paid plans: We’ve doubled the CPU power for paid plans from 2 vCPU to 4 vCPU, making your builds significantly faster
Faster single-core and multi-core performance: To ensure consistent, high performance builds, we now run your builds on the fastest available CPUs at the time your build runs

Haven’t used Workers Builds yet? You can try it by connecting a Git repository to an existing Worker, or try it out on a fresh new project by clicking any Deploy to Cloudflare button, like the one below that deploys a blog built with Astro to your Cloudflare account:

A more consistent look and feel for the Cloudflare dashboard

Durable Objects, R2, and Workers now all have a more consistent look with the rest of our developer platform. As you explore these pages you’ll find that things should load faster, feel smoother and are easier to use.

Across storage products, you can now customize the table that lists the resources on your account, choose which data you want to see, sort by any column, and hide columns you don’t need. In the Workers and Pages dashboard, we’ve reduced clutter and have modernized the design to make it faster for you to get the data you need.

And when you create a new Pipeline or a Hyperdrive configuration, you’ll find a new interface that helps you get started and guides you through each step.

This work is ongoing, and we’re excited to continue improving with the help of your feedback, so keep it coming!

Resize, clip and reformat video files on-demand with Media Transformations — now GA

In March 2025 we announced Media Transformations in open beta, which brings the magic of Image transformations to short-form video files — including video files stored outside of Cloudflare. Since then, we have increased input and output limits, and added support for audio-only extraction. Media Transformations is now generally available.

Media Transformations is ideal if you have a large existing volume of short videos, such as generative AI output, e-commerce product videos, social media clips, or short marketing content. Content like this should be fetched from your existing storage like R2 or S3 directly, optimized by Cloudflare quickly, and delivered efficiently as small MP4 files or used to extract still images and audio.

https://example.com/cdn-cgi/media//

EXAMPLE, RESIZE:
https://example.com/cdn-cgi/media/width=760/https://pub-d9fcbc1abcd244c1821f38b99017347f.r2.dev/aus-mobile.mp4


EXAMPLE, STILL THUMBNAIL:
https://example.com/cdn-cgi/media/mode=frame,time=3s,width=120,height=120,fit=cover/https://pub-d9fcbc1abcd244c1821f38b99017347f.r2.dev/aus-mobile.mp4

Media Transformations includes a free tier available to all customers and is included with Media Platform subscriptions. Check out the transform videos documentation for all the latest, then enable transformations for your zone today!

Infrequent Access in R2 is now GA

R2 Infrequent Access is now generally available. Last year, we introduced the Infrequent Access storage class designed for data that doesn’t need to be accessed frequently. It’s a great fit for use cases including long-tail user content, logs, or data backups.

Since launch, Infrequent Access has been proven in production by our customers running these types of workloads at scale. The results confirmed our goal: a storage class that reduces storage costs while maintaining performance and durability.

Pricing is simple. You pay less on data storage, while data retrievals are billed per GB to reflect the additional compute required to serve data from underlying storage optimized for less frequent access. And as with all of R2, there are no egress fees, so you don’t pay for the bandwidth to move data out. Here’s how you can upload an object to R2 infrequent access class via Workers:

export default {
  async fetch(request, env) {

    // Upload the incoming request body to R2 in Infrequent Access class
    await env.MY_BUCKET.put("my-object", request.body, {
      storageClass: "InfrequentAccess",
    });

    return new Response("Object uploaded to Infrequent Access!", {
      headers: { "Content-Type": "text/plain" },
    });
  },
};

You can also monitor your Infrequent Access vs. Standard storage usage directly in your R2 dashboard for each bucket. Get started with R2 today!

Playwright in Browser Rendering is now GA

We’re excited to announce three updates to Browser Rendering:

Our support for Playwright is now Generally Available, giving developers the stability and confidence to run critical browser tasks.
We’re introducing support for Stagehand, enabling developers to build AI agents using natural language, powered by Cloudflare Workers AI.
Finally, to help developers scale, we are tripling limits for paid plans, with more increases to come.

The browser is no longer only used by humans. AI agents need to be able to reliably navigate browsers in the same way a human would, whether that's booking flights, filling in customer info, or scraping structured data. Playwright gives AI agents the ability to interact with web pages and perform complex tasks on behalf of humans. However, running browsers at scale is a significant infrastructure challenge. Cloudflare Browser Rendering solves this by providing headless browsers on-demand. By moving Playwright support to Generally Available, and now synced with the latest version v1.55, customers have a production-ready foundation to build reliable, scalable applications on.

To help AI agents better navigate the web, we’re introducing support for Stagehand, an open source browser automation framework. Rather than dictating exact steps or specifying selectors, Stagehand enables developers to build more reliably and flexibly by combining code with natural-language instructions powered by AI. This makes it possible for AI agents to navigate and adapt if a website changes - just like a human would.

To get started with Playwright and Stagehand, check our changelog with code examples and more.

Build and deploy Remote Model Context Protocol (MCP) servers to Cloudflare

Brendan Irvine-Broque — Tue, 25 Mar 2025 13:59:00 GMT

It feels like almost everyone building AI applications and agents is talking about the Model Context Protocol (MCP), as well as building MCP servers that you install and run locally on your own computer.

You can now build and deploy remote MCP servers to Cloudflare. We’ve added four things to Cloudflare that handle the hard parts of building remote MCP servers for you:

workers-oauth-provider — an OAuth Provider that makes authorization easy
McpAgent — a class built into the Cloudflare Agents SDK that handles remote transport
mcp-remote — an adapter that lets MCP clients that otherwise only support local connections work with remote MCP servers
AI playground as a remote MCP client — a chat interface that allows you to connect to remote MCP servers, with the authentication check included

The button below, or the developer docs, will get you up and running in production with this example MCP server in less than two minutes:

Unlike the local MCP servers you may have previously used, remote MCP servers are accessible on the Internet. People simply sign in and grant permissions to MCP clients using familiar authorization flows. We think this is going to be a massive deal — connecting coding agents to MCP servers has blown developers’ minds over the past few months, and remote MCP servers have the same potential to open up similar new ways of working with LLMs and agents to a much wider audience, including more everyday consumer use cases.

From local to remote — bringing MCP to the masses

MCP is quickly becoming the common protocol that enables LLMs to go beyond inference and RAG, and take actions that require access beyond the AI application itself (like sending an email, deploying a code change, publishing blog posts, you name it). It enables AI agents (MCP clients) to access tools and resources from external services (MCP servers).

To date, MCP has been limited to running locally on your own machine — if you want to access a tool on the web using MCP, it’s up to you to set up the server locally. You haven’t been able to use MCP from web-based interfaces or mobile apps, and there hasn’t been a way to let people authenticate and grant the MCP client permission. Effectively, MCP servers haven’t yet been brought online.

Support for remote MCP connections changes this. It creates the opportunity to reach a wider audience of Internet users who aren’t going to install and run MCP servers locally for use with desktop apps. Remote MCP support is like the transition from desktop software to web-based software. People expect to continue tasks across devices and to login and have things just work. Local MCP is great for developers, but remote MCP connections are the missing piece to reach everyone on the Internet.

Making authentication and authorization just work with MCP

Beyond just changing the transport layer — from stdio to streamable HTTP — when you build a remote MCP server that uses information from the end user’s account, you need authentication and authorization. You need a way to allow users to login and prove who they are (authentication) and a way for users to control what the AI agent will be able to access when using a service (authorization).

MCP does this with OAuth, which has been the standard protocol that allows users to grant applications to access their information or services, without sharing passwords. Here, the MCP Server itself acts as the OAuth Provider. However, OAuth with MCP is hard to implement yourself, so when you build MCP servers on Cloudflare we provide it for you.

workers-oauth-provider — an OAuth 2.1 Provider library for Cloudflare Workers

When you deploy an MCP Server to Cloudflare, your Worker acts as an OAuth Provider, using workers-oauth-provider, a new TypeScript library that wraps your Worker’s code, adding authorization to API endpoints, including (but not limited to) MCP server API endpoints.

Your MCP server will receive the already-authenticated user details as a parameter. You don’t need to perform any checks of your own, or directly manage tokens. You can still fully control how you authenticate users: from what UI they see when they log in, to which provider they use to log in. You can choose to bring your own third-party authentication and authorization providers like Google or GitHub, or integrate with your own.

The complete MCP OAuth flow looks like this:

Here, your MCP server acts as both an OAuth client to your upstream service, and as an OAuth server (also referred to as an OAuth “provider”) to MCP clients. You can use any upstream authentication flow you want, but workers-oauth-provider guarantees that your MCP server is spec-compliant and able to work with the full range of client apps & websites. This includes support for Dynamic Client Registration (RFC 7591) and Authorization Server Metadata (RFC 8414).

A simple, pluggable interface for OAuth

When you build an MCP server with Cloudflare Workers, you provide an instance of the OAuth Provider paths to your authorization, token, and client registration endpoints, along with handlers for your MCP Server, and for auth:

import OAuthProvider from "@cloudflare/workers-oauth-provider";
import MyMCPServer from "./my-mcp-server";
import MyAuthHandler from "./auth-handler";

export default new OAuthProvider({
  apiRoute: "/sse", // MCP clients connect to your server at this route
  apiHandler: MyMCPServer.mount('/sse'), // Your MCP Server implmentation
  defaultHandler: MyAuthHandler, // Your authentication implementation
  authorizeEndpoint: "/authorize",
  tokenEndpoint: "/token",
  clientRegistrationEndpoint: "/register",
});

This abstraction lets you easily plug in your own authentication. Take a look at this example that uses GitHub as the identity provider for an MCP server, in less than 100 lines of code, by implementing /callback and /authorize routes.

Why do MCP servers issue their own tokens?

You may have noticed in the authorization diagram above, and in the authorization section of the MCP spec, that the MCP server issues its own token to the MCP client.

Instead of passing the token it receives from the upstream provider directly to the MCP client, your Worker stores an encrypted access token in Workers KV. It then issues its own token to the client. As shown in the GitHub example above, this is handled on your behalf by the workers-oauth-provider — your code never directly handles writing this token, preventing mistakes. You can see this in the following code snippet from the GitHub example above:

  // When you call completeAuthorization, the accessToken you pass to it
  // is encrypted and stored, and never exposed to the MCP client
  // A new, separate token is generated and provided to the client at the /token endpoint
  const { redirectTo } = await c.env.OAUTH_PROVIDER.completeAuthorization({
    request: oauthReqInfo,
    userId: login,
    metadata: { label: name },
    scope: oauthReqInfo.scope,
    props: {
      accessToken,  // Stored encrypted, never sent to MCP client
    },
  })

  return Response.redirect(redirectTo)

On the surface, this indirection might sound more complicated. Why does it work this way?

By issuing its own token, MCP Servers can restrict access and enforce more granular controls than the upstream provider. If a token you issue to an MCP client is compromised, the attacker only gets the limited permissions you've explicitly granted through your MCP tools, not the full access of the original token.

Let’s say your MCP server requests that the user authorize permission to read emails from their Gmail account, using the gmail.readonly scope. The tool that the MCP server exposes is more narrow, and allows reading travel booking notifications from a limited set of senders, to handle a question like “What’s the check-out time for my hotel room tomorrow?” You can enforce this constraint in your MCP server, and if the token you issue to the MCP client is compromised, because the token is to your MCP server — and not the raw token to the upstream provider (Google) — an attacker cannot use it to read arbitrary emails. They can only call the tools your MCP server provides. OWASP calls out “Excessive Agency” as one of the top risk factors for building AI applications, and by issuing its own token to the client and enforcing constraints, your MCP server can limit tools access to only what the client needs.

Or building off the earlier GitHub example, you can enforce that only a specific user is allowed to access a particular tool. In the example below, only users that are part of an allowlist can see or call the generateImage tool, that uses Workers AI to generate an image based on a prompt:

import { McpAgent } from "agents/mcp";
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { z } from "zod";

const USER_ALLOWLIST = ["geelen"];

export class MyMCP extends McpAgent {
  server = new McpServer({
    name: "Github OAuth Proxy Demo",
    version: "1.0.0",
  });

  async init() {
    // Dynamically add tools based on the user's identity
    if (USER_ALLOWLIST.has(this.props.login)) {
      this.server.tool(
        'generateImage',
        'Generate an image using the flux-1-schnell model.',
        {
          prompt: z.string().describe('A text description of the image you want to generate.')
        },
        async ({ prompt }) => {
          const response = await this.env.AI.run('@cf/black-forest-labs/flux-1-schnell', { 
            prompt, 
            steps: 8 
          })
          return {
            content: [{ type: 'image', data: response.image!, mimeType: 'image/jpeg' }],
          }
        }
      )
    }
  }
}

Introducing McpAgent: remote transport support that works today, and will work with the revision to the MCP spec

The next step to opening up MCP beyond your local machine is to open up a remote transport layer for communication. MCP servers you run on your local machine just communicate over stdio, but for an MCP server to be callable over the Internet, it must implement remote transport.

The McpAgent class we introduced today as part of our Agents SDK handles this for you, using Durable Objects behind the scenes to hold a persistent connection open, so that the MCP client can send server-sent events (SSE) to your MCP server. You don’t have to write code to deal with transport or serialization yourself. A minimal MCP server in 15 lines of code can look like this:

import { McpAgent } from "agents/mcp";
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { z } from "zod";

export class MyMCP extends McpAgent {
  server = new McpServer({
    name: "Demo",
    version: "1.0.0",
  });
  async init() {
    this.server.tool("add", { a: z.number(), b: z.number() }, async ({ a, b }) => ({
      content: [{ type: "text", text: String(a + b) }],
    }));
  }
}

After much discussion, remote transport in the MCP spec is changing, with Streamable HTTP replacing HTTP+SSE This allows for stateless, pure HTTP connections to MCP servers, with an option to upgrade to SSE, and removes the need for the MCP client to send messages to a separate endpoint than the one it first connects to. The McpAgent class will change with it and just work with streamable HTTP, so that you don’t have to start over to support the revision to how transport works.

This applies to future iterations of transport as well. Today, the vast majority of MCP servers only expose tools, which are simple remote procedure call (RPC) methods that can be provided by a stateless transport. But more complex human-in-the-loop and agent-to-agent interactions will need prompts and sampling. We expect these types of chatty, two-way interactions will need to be real-time, which will be challenging to do well without a bidirectional transport layer. When that time comes, Cloudflare, the Agents SDK, and Durable Objects all natively support WebSockets, which enable full-duplex, bidirectional real-time communication.

Stateful, agentic MCP servers

When you build MCP servers on Cloudflare, each MCP client session is backed by a Durable Object, via the Agents SDK. This means each session can manage and persist its own state, backed by its own SQL database.

This opens the door to building stateful MCP servers. Rather than just acting as a stateless layer between a client app and an external API, MCP servers on Cloudflare can themselves be stateful applications — games, a shopping cart plus checkout flow, a persistent knowledge graph, or anything else you can dream up. When you build on Cloudflare, MCP servers can be much more than a layer in front of your REST API.

To understand the basics of how this works, let’s look at a minimal example that increments a counter:

import { McpAgent } from "agents/mcp";
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { z } from "zod";

type State = { counter: number }

export class MyMCP extends McpAgent {
  server = new McpServer({
    name: "Demo",
    version: "1.0.0",
  });

  initialState: State = {
    counter: 1,
  }

  async init() {
    this.server.resource(`counter`, `mcp://resource/counter`, (uri) => {
      return {
        contents: [{ uri: uri.href, text: String(this.state.counter) }],
      }
    })

    this.server.tool('add', 'Add to the counter, stored in the MCP', { a: z.number() }, async ({ a }) => {
      this.setState({ ...this.state, counter: this.state.counter + a })

      return {
        content: [{ type: 'text', text: String(`Added ${a}, total is now ${this.state.counter}`) }],
      }
    })
  }

  onStateUpdate(state: State) {
    console.log({ stateUpdate: state })
  }

}

For a given session, the MCP server above will remember the state of the counter across tool calls.

From within an MCP server, you can use Cloudflare’s whole developer platform, and have your MCP server spin up its own web browser, trigger a Workflow, call AI models, and more. We’re excited to see the MCP ecosystem evolve into more advanced use cases.

Connect to remote MCP servers from MCP clients that today only support local MCP

Cloudflare is supporting remote MCP early — before the most prominent MCP client applications support remote, authenticated MCP, and before other platforms support remote MCP. We’re doing this to give you a head start building for where MCP is headed.

But if you build a remote MCP server today, this presents a challenge — how can people start using your MCP server if there aren’t MCP clients that support remote MCP?

We have two new tools that allow you to test your remote MCP server and simulate how users will interact with it in the future:

We updated the Workers AI Playground to be a fully remote MCP client that allows you to connect to any remote MCP server with built-in authentication support. This online chat interface lets you immediately test your remote MCP servers without having to install anything on your device. Instead, just enter the remote MCP server’s URL (e.g. https://remote-server.example.com/sse) and click Connect.

Once you click Connect, you’ll go through the authentication flow (if you set one up) and after, you will be able to interact with the MCP server tools directly from the chat interface.

If you prefer to use a client like Claude Desktop or Cursor that already supports MCP but doesn’t yet handle remote connections with authentication, you can use mcp-remote. mcp-remote is an adapter that lets MCP clients that otherwise only support local connections to work with remote MCP servers. This gives you and your users the ability to preview what interactions with your remote MCP server will be like from the tools you’re already using today, without having to wait for the client to support remote MCP natively.

We’ve published a guide on how to use mcp-remote with popular MCP clients including Claude Desktop, Cursor, and Windsurf. In Claude Desktop, you add the following to your configuration file:

{
  "mcpServers": {
    "remote-example": {
      "command": "npx",
      "args": [
        "mcp-remote",
        "https://remote-server.example.com/sse"
      ]
    }
  }
}

1800-mcp@cloudflare.com — start building remote MCP servers today

Remote Model Context Protocol (MCP) is coming! When client apps support remote MCP servers, the audience of people who can use them opens up from just us, developers, to the rest of the population — who may never even know what MCP is or stands for.

Building a remote MCP server is the way to bring your service into the AI assistants and tools that millions of people use. We’re excited to see many of the biggest companies on the Internet are busy building MCP servers right now, and we are curious about the businesses that pop-up in an agent-first, MCP-native way.

On Cloudflare, you can start building today. We’re ready for you, and ready to help build with you. Email us at 1800-mcp@cloudflare.com, and we’ll help get you going. There’s lots more to come with MCP, and we’re excited to see what you build.

Wrapping up another Birthday Week celebration

Kelly May Johnston — Mon, 30 Sep 2024 13:00:00 GMT

2024 marks Cloudflare’s 14th birthday. Birthday Week each year is packed with major announcements and the release of innovative new offerings, all focused on giving back to our customers and the broader Internet community. Birthday Week has become a proud tradition at Cloudflare and our culture, to not just stay true to our mission, but to always stay close to our customers. We begin planning for this week of celebration earlier in the year and invite everyone at Cloudflare to participate.

Months before Birthday Week, we invited teams to submit ideas for what to announce. We were flooded with submissions, from proposals for implementing new standards to creating new products for developers. Our biggest challenge is finding space for it all in just one week — there is still so much to build. Good thing we have a birthday to celebrate each year, but we might need an extra day in Birthday Week next year!

In case you missed it, here’s everything we announced during 2024’s Birthday Week:

Monday

What	In a sentence…
Start auditing and controlling the AI models accessing your content	Understand which AI-related bots and crawlers can access your website, and which content you choose to allow them to consume.
Making zone management more efficient with batch DNS record updates	Customers using Cloudflare to manage DNS can create a whole batch of records, enable proxying on many records, update many records to point to a new target at the same time, or even delete all of their records.
Introducing Ephemeral IDs: a new tool for fraud detection	Taking the next step in advancing security with Ephemeral IDs, a new feature that generates a unique short-lived ID, without relying on any network-level information.

Tuesday

What	In a sentence…
Cloudflare partners to deliver safer browsing experience to homes	Internet service, network, and hardware equipment providers can sign up and partner with Cloudflare to deliver a safer browsing experience to homes.
A safer Internet with Cloudflare: free threat intelligence, analytics, and new threat detections	Free threat intelligence, analytics, new threat detections, and more.
Automatically generating Cloudflare’s Terraform provider	The last pieces of the OpenAPI schemas ecosystem to now be automatically generated — the Terraform provider and API reference documentation.
Cloudflare helps verify the security of end-to-end encrypted messages by auditing key transparency for WhatsApp	Cloudflare helps verify the security of end-to-end encrypted messages by auditing key transparency for WhatsApp.

Wednesday

What	In a sentence…
Introducing Speed Brain: helping web pages load 45% faster	Speed Brain, our latest leap forward in speed, uses the Speculation Rules API to prefetch content for users' likely next navigations — downloading web pages before they navigate to them and making pages load 45% faster.
Instant Purge: invalidating cached content in under 150ms	Instant Purge invalidates cached content in under 150ms, offering the industry's fastest cache purge with global latency for purges by tags, hostnames, and prefixes.
New standards for a faster and more private Internet	Zstandard compression, Encrypted Client Hello, and more speed and privacy announcements all released for free.
TURN and anycast: making peer connections work globally	Starting today, Cloudflare Calls’ TURN service is now generally available to all Cloudflare accounts.
Cloudflare’s 12th Generation servers — 145% more performant and 63% more efficient	Next generation servers focused on exceptional performance and security, enhanced support for AI/ML workloads, and significant strides in power efficiency.

Thursday

What	In a sentence…
Startup Program revamped: build and grow on Cloudflare with up to $250,000 in credits	Eligible startups can now apply to receive up to $250,000 in credits to build using Cloudflare's Developer Platform.
Cloudflare’s bigger, better, faster AI platform	More powerful GPUs, expanded model support, enhanced logging and evaluations in AI Gateway, and Vectorize GA with larger index sizes and faster queries.
Builder Day 2024: 18 big updates to the Workers platform	Persistent and queryable Workers logs, Node.js compatibility GA, improved Next.js support via OpenNext, built-in CI/CD for Workers, Gradual Deployments, Queues, and R2 Event Notifications GA, and more — making building on Cloudflare easier, faster, and more affordable.
Faster Workers KV	A deep dive into how we made Workers KV up to 3x faster.
Zero-latency SQLite storage in every Durable Object	Putting your application code into the storage layer, so your code runs where the data is stored.
Making Workers AI faster and more efficient: Performance optimization with KV cache compression and speculative decoding	Using new optimization techniques such as KV cache compression and speculative decoding, we’ve made large language model (LLM) inference lightning-fast on the Cloudflare Workers AI platform.

Friday

What	In a sentence…
Our container platform is in production. It has GPUs. Here’s an early look.	We’ve been working on something new — a platform for running containers across Cloudflare’s network. We already use it in production, for AI inference and more.
Advancing cybersecurity: Cloudflare implements a new bug bounty VIP program as part of CISA Pledge commitment	We implemented a new bug bounty VIP program this year as part of our CISA Pledge commitment.
Empowering builders: introducing the Dev Alliance and Workers Launchpad Cohort #4	Get free and discounted access to essential developer tools and meet the latest set of incredible startups building on Cloudflare.
Expanding our support for open source projects with Project Alexandria	Expanding our open source program and helping projects have a sustainable and scalable future, providing tools and protection needed to thrive.
Network trends and natural language: Cloudflare Radar’s new Data Explorer & AI Assistant	A simple Web-based interface to build more complex API queries, including comparisons and filters, and visualize the results.
AI Everywhere with the WAF Rule Builder Assistant, Cloudflare Radar AI Insights, and updated AI bot protection	Extending our AI Assistant capabilities to help you build new WAF rules, added new AI bot and crawler traffic insights to Radar, and new AI bot blocking capabilities.
Reaffirming our commitment to Free	Our free plan is here to stay, and we reaffirm that commitment this week with 15 releases that make the Free plan even better.

One more thing…

Cloudflare serves millions of customers and their millions of domains across nearly every country on Earth. However, as a global company, the payment landscape can be complex — especially in regions outside of North America. While credit cards are very popular for online purchases in the US, the global picture is quite different. 60% of consumers across EMEA, APAC and LATAM choose alternative payment methods. For instance, European consumers often opt for SEPA Direct Debit, a bank transfer mechanism, while Chinese consumers frequently use Alipay, a digital wallet.

At Cloudflare, we saw this as an opportunity to meet customers where they are. Today, we're thrilled to announce that we are expanding our payment system and launching a closed beta for a new payment method called Stripe Link. The checkout experience will be faster and more seamless, allowing our self-serve customers to pay using saved bank accounts or cards with Link. Customers who have saved their payment details at any business using Link can quickly check out without having to reenter their payment information.

These are the first steps in our efforts to expand our payment system to support global payment methods used by customers around the world. We'll be rolling out new payment methods gradually, ensuring a smooth integration and gathering feedback from our customers every step of the way.

Until next year

That’s all for Birthday Week 2024. However, the innovation never stops at Cloudflare. Continue to follow the Cloudflare Blog all year long as we launch more products and features that help build a better Internet.

Our container platform is in production. It has GPUs. Here’s an early look

Brendan Irvine-Broque — Fri, 27 Sep 2024 13:00:00 GMT

We’ve been working on something new — a platform for running containers across Cloudflare’s network. We already use it in production for Workers AI, Workers Builds, Remote Browsing Isolation, and the Browser Rendering API. Today, we want to share an early look at how it’s built, why we built it, and how we use it ourselves.

In 2024, Cloudflare Workers celebrates its 7th birthday. When we first announced Workers, it was a completely new model for running compute in a multi-tenant way — on isolates, as opposed to containers. While, at the time, Workers was a pretty bare-bones functions-as-a-service product, we took a big bet that this was going to become the way software was going to be written going forward. Since introducing Workers, in addition to expanding our developer products in general to include storage and AI, we have been steadily adding more compute capabilities to Workers:

2020	Cron Triggers
2021	Durable Objects Write Workers in Rust Service Bindings
2022	Queues Email Workers Durable Objects Alarms
2023	Workers TCP Socket API Hyperdrive Smart Placement Workers AI
2024	Python Workers JavaScript-native RPC Node.js compatibility SQLite in Durable Objects

With each of these, we’ve faced a question — can we build this natively into the platform, in a way that removes, rather than adds complexity? Can we build it in a way that lets developers focus on building and shipping, rather than managing infrastructure, so that they don’t have to be a distributed systems engineer to build distributed systems?

In each instance, the answer has been YES. We try to solve problems in a way that simplifies things for developers in the long run, even if that is the harder path for us to take ourselves. If we didn’t, you’d be right to ask — why not self-host and manage all of this myself? What’s the point of the cloud if I’m still provisioning and managing infrastructure? These are the questions many are asking today about the earlier generation of cloud providers.

Pushing ourselves to build platform-native products and features helped us answer this question. Particularly because some of these actually use containers behind the scenes, even though as a developer you never interact with or think about containers yourself.

If you’ve used AI inference on GPUs with Workers AI, spun up headless browsers with Browser Rendering, or enqueued build jobs with the new Workers Builds, you’ve run containers on our network, without even knowing it. But to do so, we needed to be able to run untrusted code across Cloudflare’s network, outside a v8 isolate, in a way that fits what we promise:

You shouldn’t have to think about regions or data centers. Routing, scaling, load balancing, scheduling, and capacity are our problem to solve, not yours, with tools like Smart Placement.
You should be able to build distributed systems without being a distributed systems engineer.
Every millisecond matters — Cloudflare has to be fast.

There wasn’t an off-the-shelf container platform that solved for what we needed, so we built it ourselves — from scheduling to IP address management, pulling and caching images, to improving startup times and more. Our container platform powers many of our newest products, so we wanted to share how we built it, optimized it, and well, you can probably guess what’s next.

Global scheduling — “The Network is the Computer”

Cloudflare serves the entire world — region: earth. Rather than asking developers to provision resources in specific regions, data centers and availability zones, we think “The Network is the Computer”. When you build on Cloudflare, you build software that runs on the Internet, not just in a data center.

When we started working on this, Cloudflare’s architecture was to just run every service via systemd on every server (we call them “metals” — we run our own hardware), allowing all services to take advantage of new capacity we add to our network. That fits running NGINX and a few dozen other services, but cannot fit a world where we need to run many thousands of different compute heavy, resource hungry workloads. We’d run out of space just trying to load all of them! Consider a canonical AI workload — deploying Llama 3.1 8B to an inference server. If we simply ran a Llama 3.1 8B service on every Cloudflare metal, we’d have no flexibility to use GPUs for the many other models that Workers AI supports.

We needed something that would allow us to still take advantage of the full capacity of Cloudflare’s entire network, not just the capacity of individual machines. And ideally not put that burden on the developer.

The answer: we built a control plane on our own Developer Platform that lets us schedule a container anywhere on Cloudflare’s Network:

The global scheduler is built on Cloudflare Workers, Durable Objects, and KV, and decides which Cloudflare location to schedule the container to run in. Each location then runs its own scheduler, which decides which metals within that location to schedule the container to run on. Location schedulers monitor compute capacity, and expose this to the global scheduler. This allows Cloudflare to dynamically place workloads based on capacity and hardware availability (e.g. multiple types of GPUs) across our network.

Why does global scheduling matter?

When you run compute on a first generation cloud, the “contract” between the developer and the platform is that the developer must specify what runs where. This is regional scheduling, the status quo.

Let’s imagine for a second if we applied regional scheduling to running compute on Cloudflare’s network, with locations in 330+ cities, across 120+ countries. One of the obvious reasons people tell us they want to run on Cloudflare is because we have compute in places where others don’t, within 50ms of 95% of the world’s Internet-connected population. In South America, other clouds have one region in one city. Cloudflare has 19:

Running anywhere means you can be faster, highly available, and have more control over data location. But with regional scheduling, the more locations you run in, the more work you have to do. You configure and manage load balancing, routing, auto-scaling policies and more. Balancing performance and cost in a multi-region setup is literally a full-time job (or more) at most companies who have reached meaningful scale on traditional clouds.

But most importantly, no matter what tools you bring, you were the one who told the cloud provider, “run this container over here”. The cloud platform can’t move it for you, even if moving it would make your workload faster. This prevents the platform from adding locations, because for each location, it has to convince developers to take action themselves to move their compute workloads to the new location. Each new location carries a risk that developers won’t migrate workloads to it, or migrate too slowly, delaying the return on investment.

Global scheduling means Cloudflare can add capacity and use it immediately, letting you benefit from it. The “contract” between us and our customers isn’t tied to a specific datacenter or region, so we have permission to move workloads around to benefit customers. This flexibility plays an essential role in all of our own uses of our container platform, starting with GPUs and AI.

GPUs everywhere: Scheduling large images with Workers AI

In late 2023, we launched Workers AI, which provides fast, easy to use, and affordable GPU-backed AI inference.

The more efficiently we can use our capacity, the better pricing we can offer. And the faster we can make changes to which models run in which Cloudflare locations, the closer we can move AI inference to the application, lowering Time to First Token (TTFT). This also allows us to be more resilient to spikes in inference requests.

AI models that rely on GPUs present three challenges though:

Models have different GPU memory needs. GPU memory is the most scarce resource, and different GPUs have different amounts of memory.
Not all container runtimes, such as Firecracker, support GPU drivers and other dependencies.
AI models, particularly LLMs, are very large. Even a smaller parameter model, like @cf/meta/llama-3.1-8b-instruct, is at least 5 GB. The larger the model, the more bytes we need to pull across the network when scheduling a model to run in a new location.

Let’s dive into how we solved each of these…

First, GPU memory needs. The global scheduler knows which Cloudflare locations have blocks of GPU memory available, and then delegates scheduling the workload on a specific metal to the local scheduler. This allows us to prioritize placement of AI models that use a large amount of GPU memory, and then move smaller models to other machines in the same location. By doing this, we maximize the overall number of locations that we run AI models in, and maximize our efficiency.

Second, container runtimes and GPU support. Thankfully, from day one we built our container platform to be runtime agnostic. Using a runtime agnostic scheduler, we’re able to support gVisor, Firecracker microVMs, and traditional VMs with QEMU. We are also evaluating adding support for another one: cloud-hypervisor which is based on rust-vmm and has a few compelling advantages for our use case:

GPU passthrough support using VFIO
vhost-user-net support, enabling high throughput between the host network interface and the VM
vhost-user-blk support, adding flexibility to introduce novel network-based storage backed by other Cloudflare Workers products
all the while being a smaller codebase than QEMU and written in a memory-safe language

Our goal isn’t to build a platform that makes you as the developer choose between runtimes, and ask, “should I use Firecracker or gVisor”. We needed this flexibility in order to be able to run workloads with different needs efficiently, including workloads that depend on GPUs. gVisor has GPU support, while Firecracker microVMs currently does not.

gVisor’s main component is an application kernel (called Sentry) that implements a Linux-like interface but is written in a memory-safe language (Go) and runs in userspace. It works by intercepting application system calls and acting as the guest kernel, without the need for translation through virtualized hardware.

The resource footprint of a containerized application running on gVisor is lower than a VM because it does not require managing virtualized hardware and booting up a kernel instance. However, this comes at the price of reduced application compatibility and higher per-system call overhead.

To add GPU support, the Google team introduced nvproxy which works using the same principles as described above for syscalls: it intercepts ioctls destined to the GPU and proxies a subset to the GPU kernel module.

To solve the third challenge, and make scheduling fast with large models, we weren’t satisfied with the status quo. So we did something about it.

Docker pull was too slow, so we fixed it (and cut the time in half)

Many of the images we need to run for AI inference are over 15 GB. Specialized inference libraries and GPU drivers add up fast. For example, when we make a scheduling decision to run a fresh container in Tokyo, naively running docker pull to fetch the image from a storage bucket in Los Angeles would be unacceptably slow. And scheduling speed is critical to being able to scale up and down in new locations in response to changes in traffic.

We had 3 essential requirements:

Pulling and pushing very large images should be fast
We should not rely on a single point of failure
Our teams shouldn’t waste time managing image registries

We needed globally distributed storage, so we used R2. We needed the highest cache hit rate possible, so we used Cloudflare’s Cache, and will soon use Tiered Cache. And we needed a fast container image registry that we could run everywhere, in every Cloudflare location, so we built and open-sourced serverless-registry, which is built on Workers. You can deploy serverless-registry to your own Cloudflare account in about 5 minutes. We rely on it in production.

This is fast, but we can be faster. Our performance bottleneck was, somewhat surprisingly, docker push. Docker uses gzip to compress and decompress layers of images while pushing and pulling. So we started using Zstandard (zstd) instead, which compresses and decompresses faster, and results in smaller compressed files.

In order to build, chunk, and push these images to the R2 registry, we built a custom CLI tool that we use internally in lieu of running docker build and docker push. This makes it easy to use zstd and split layers into 500 MB chunks, which allows uploads to be processed by Workers while staying under body size limits.

Using our custom build and push tool doubled the speed of image pulls. Our 30 GB GPU images now pull in 4 minutes instead of 8. We plan on open sourcing this tool in the near future.

Anycast is the gift that keeps on simplifying — Virtual IPs and the Global State Router

We still had another challenge to solve. And yes, we solved it with anycast. We’re Cloudflare, did you expect anything else?

First, a refresher — Cloudflare operates Unimog, a Layer 4 load balancer that handles all incoming Cloudflare traffic. Cloudflare’s network uses anycast, which allows a single IP address to route requests to a variety of different locations. For most Cloudflare services with anycast, the given IP address will route to the nearest Cloudflare data center, reducing latency. Since Cloudflare runs almost every service in every data center, Unimog can simply route traffic to any Cloudflare metal that is online and has capacity, without needing to map traffic to a specific service that runs on specific metals, only in some locations.

The new compute-heavy, GPU-backed workloads we were taking on forced us to confront this fundamental “everything runs everywhere” assumption. If we run a containerized workflow in 20 Cloudflare locations, how does Unimog know which locations, and which metals, it runs in? You might say “just bring your own load balancer” — but then what happens when you make scheduling decisions to migrate a workload between locations, scale up, or scale down?

Anycast is foundational to how we build fast and simple products on our network, and we needed a way to keep building new types of products this way — where a team can deploy an application, get back a single IP address, and rely on the platform to balance traffic, taking load, container health, and latency into account, without extra configuration. We started letting teams use the container platform without solving this, and it was painfully clear that we needed to do something about it.

So we started integrating directly into our networking stack, building a sidecar service to Unimog. We’ll call this the Global State Router. Here’s how it works:

An eyeball makes a request to a virtual IP address issued by Cloudflare
Request sent to the best location as determined by BGP routing. This is anycast routing.
A small eBPF program sits on the main networking interface and ensures packets bound to a virtual IP address are handled by the Global State Router.
The main Global State Router program contains a mapping of all anycast IPs addresses to potential end destination container IP addresses. It updates this mapping based on container health, readiness, distance, and latency. Using this information, it picks a best-fit container.
Packets are forwarded at the L4 layer.
When a target container’s server receives a packet, its own Global State Router program intercepts the packet and routes it to the local container.

This might sound like just a lower level networking detail, disconnected from developer experience. But by integrating directly with Unimog, we can let developers:

Push a containerized application to Cloudflare.
Provide constraints, health checks, and load metrics that describe what the application needs.
Delegate scheduling and scaling many containers across Cloudflare’s network.
Get back a single IP address that can be used everywhere.

We’re actively working on this, and are excited to continue building on Cloudflare’s anycast capabilities, and pushing to keep the simplicity of running “everywhere” with new categories of workloads.

Low latency & global — Remote Browser Isolation & Browser Rendering

Our container platform actually started because of a very specific challenge, running Remote Browser Isolation across our network. Remote Browser Isolation provides Chromium browsers that run on Cloudflare, in containers, rather than on the end user’s own computer. Only the rendered output is sent to the end user. This provides a layer of protection against zero-day browser vulnerabilities, phishing attacks, and ransomware.

Location is critical — people expect their interactions with a remote browser to feel just as fast as if it ran locally. If the server is thousands of miles away, the remote browser will feel slow. Running across Cloudflare’s network of over 330 locations means the browser is nearly always as close to you as possible.

Imagine a user in Santiago, Chile, if they were to access a browser running in the same city, each interaction would incur negligible additional latency. Whereas a browser in Buenos Aires might add 21 ms, São Paulo might add 48 ms, Bogota might add 67 ms, and Raleigh, NC might add 128 ms. Where the container runs significantly impacts the latency of every user interaction with the browser, and therefore the experience as a whole.

It’s not just browser isolation that benefits from running near the user: WebRTC servers stream video better, multiplayer games have less lag, online advertisements can be served faster, financial transactions can be processed faster. Our container platform lets us run anything we need to near the user, no matter where they are in the world.

Using spare compute — “off-peak” jobs for Workers CI/CD builds

At any hour of the day, Cloudflare has many CPU cores that sit idle. This is compute power that could be used for something else.

Via anycast, most of Cloudflare’s traffic is handled as close as possible to the eyeball (person) that requested it. Most of our traffic originates from eyeballs. And the eyeballs of (most) people are closed and asleep between midnight and 5:00 AM local time. While we use our compute capacity very efficiently during the daytime in any part of the world, overnight we have spare cycles. Consider what a map of the world looks like at nighttime in Europe and Africa:

As shown above, we can run containers during “off-peak” in Cloudflare locations receiving low traffic at night. During this time, the CPU utilization of a typical Cloudflare metal looks something like this:

We have many “background” compute workloads at Cloudflare. These are workloads that don’t actually need to run close to the eyeball because there is no eyeball waiting on the request. The challenge is that many of these workloads require running untrusted code — either a dependency on open-source code that we don’t trust enough to run outside of a sandboxed environment, or untrusted code that customers deploy themselves. And unlike Cron Triggers, which already make a best-effort attempt to use off-peak compute, these other workloads can’t run in v8 isolates.

On Builder Day 2024, we announced Workers Builds in open beta. You connect your Worker to a git repository, and Cloudflare builds and deploys your Worker each time you merge a pull request. Workers Builds run on our containers platform, using otherwise idle “off-peak” compute, allowing us to offer lower pricing, and hold more capacity for unexpected spikes in traffic in Cloudflare locations during daytime hours when load is highest. We preserve our ability to serve requests as close to the eyeball as possible where it matters, while using the full compute capacity of our network.

We developed a purpose-built API for these types of jobs. The Workers Builds service has zero knowledge of where Cloudflare has spare compute capacity on its network — it simply schedules an “off-peak” job to run on the containers platform, by defining a scheduling policy:

scheduling_policy: "off-peak"

Making off-peak jobs faster with prewarmed images

Just because a workload isn’t “eyeball-facing” doesn’t mean speed isn’t relevant. When a build job starts, you still want it to start as soon as possible.

Each new build requires a fresh container though, and we must avoid reusing containers to provide strong isolation between customers. How can we keep build job start times low, while using a new container for each job without over-provisioning?

We prewarm servers with the proper image.

Before a server becomes eligible to receive an “off peak” job, the container platform instructs it to download the correct image. Once the image is downloaded and cached locally, new containers can start quickly in a Firecracker VM after receiving a request for a new build. When a build completes, we throw away the container, and start the next build using a fresh container based on the prewarmed image.

Without prewarming, pulling and unpacking our Workers Build images would take roughly 75 seconds. With prewarming, we’re able to spin up a new container in under 10 seconds. We expect this to get even faster as we introduce optimizations like pre-booting images before new runs, or Firecracker snapshotting, which can restore a VM in under 200ms.

Workers and containers, better together

As more of our own engineering teams rely on our containers platform in production, we’ve noticed a pattern: they want a deeper integration with Workers.

We plan to give it to them.

Let’s take a look at a project deployed on our container platform already, Key Transparency. If the container platform were highly integrated with Workers, what would this team’s experience look like?

Cloudflare regularly audits changes to public keys used by WhatsApp for encrypting messages between users. Much of the architecture is built on Workers, but there are long-running compute-intensive tasks that are better suited for containers.

We don’t want our teams to have to jump through hoops to deploy a container and integrate with Workers. They shouldn’t have to pick specific regions to run in, figure out scaling, expose IPs and handle IP updates, or set up Worker-to-container auth.

We’re still exploring many different ideas and API designs, and we want your feedback. But let’s imagine what it might look like to use Workers, Durable Objects and Containers together.

In this case, an outer layer of Workers handles most business logic and ingress, a specialized Durable Object is configured to run alongside our new container, and the platform ensures the image is loaded on the right metals and can scale to meet demand.

I add a containerized app to the wrangler.toml configuration file of my Worker (or Terraform):

[[container-app]]
image = "./key-transparency/verifier/Dockerfile"
name = "verifier"

[durable_objects]
bindings = { name = "VERIFIER", class_name = "Verifier", container = "verifier" } }

Then, in my Worker, I call the runVerification RPC method of my Durable Object:

fetch(request, env, ctx) {
  const id = new URL(request.url).searchParams.get('id')
  const durableObjectId = env.VERIFIER.idFromName(request.params.id);
  await env.VERIFIER.get(durableObjectId).runVerification()
  //...
}

From my Durable Object I can boot, configure, mount storage buckets as directories, and make HTTP requests to the container:

class Verifier extends DurableObject {
  constructor(state, env) {
    this.ctx.blockConcurrency(async () => {

      // starts the container
      await this.ctx.container.start();

      // configures the container before accepting traffic
      const config = await this.state.storage.get("verifierConfig");
      await this.ctx.container.fetch("/set-config", { method: "PUT", body: config});
    })
  }

  async runVerification(updateId) {
    // downloads & mounts latest updates from R2
    const latestPublicKeyUpdates = await this.env.R2.get(`public-key-updates/${updateId}`);
    await this.ctx.container.mount(`/updates/${updateId}`, latestPublicKeyUpdates);

    // starts verification via HTTP call 
    return await this.ctx.container.fetch(`/verifier/${updateId}`);
  }
}

And… that’s it.

I didn’t have to worry about placement, scaling, service discovery authorization, and I was able to leverage integrations into other services like KV and R2 with just a few lines of code. The container platform took care of routing, placement, and auth. If I needed more instances, I could call the binding with a new ID, and the platform would scale up containers for me.

We are still in the early stages of building these integrations, but we’re excited about everything that containers will bring to Workers and vice versa.

So, what do you want to build?

If you’ve read this far, there’s a non-zero chance you were hoping to get to run a container yourself on our network. While we’re not ready (quite yet) to open up the platform to everyone, now that we’ve built a few GA products on our container platform, we’re looking for a handful of engineering teams to start building, in advance of wider availability in 2025. And we’re continuing to hire engineers to work on this.

We’ve told you about our use cases for containers, and now it’s your turn. If you’re interested, tell us here what you want to build, and why it goes beyond what’s possible today in Workers and on our Developer Platform. What do you wish you could build on Cloudflare, but can’t yet today?

Leveling up Workers AI: general availability and more new capabilities

Michelle Chen — Tue, 02 Apr 2024 13:01:00 GMT

Welcome to Tuesday – our AI day of Developer Week 2024! In this blog post, we’re excited to share an overview of our new AI announcements and vision, including news about Workers AI officially going GA with improved pricing, a GPU hardware momentum update, an expansion of our Hugging Face partnership, Bring Your Own LoRA fine-tuned inference, Python support in Workers, more providers in AI Gateway, and Vectorize metadata filtering.

Workers AI GA

Today, we’re excited to announce that our Workers AI inference platform is now Generally Available. After months of being in open beta, we’ve improved our service with greater reliability and performance, unveiled pricing, and added many more models to our catalog.

Improved performance & reliability

With Workers AI, our goal is to make AI inference as reliable and easy to use as the rest of Cloudflare’s network. Under the hood, we’ve upgraded the load balancing that is built into Workers AI. Requests can now be routed to more GPUs in more cities, and each city is aware of the total available capacity for AI inference. If the request would have to wait in a queue in the current city, it can instead be routed to another location, getting results back to you faster when traffic is high. With this, we’ve increased rate limits across all our models – most LLMs now have a of 300 requests per minute, up from 50 requests per minute during our beta phase. Smaller models have a limit of 1500-3000 requests per minute. Check out our Developer Docs for the rate limits of individual models.

Lowering costs on popular models

Alongside our GA of Workers AI, we published a pricing calculator for our 10 non-beta models earlier this month. We want Workers AI to be one of the most affordable and accessible solutions to run inference, so we added a few optimizations to our models to make them more affordable. Now, Llama 2 is over 7x cheaper and Mistral 7B is over 14x cheaper to run than we had initially published on March 1. We want to continue to be the best platform for AI inference and will continue to roll out optimizations to our customers when we can.

As a reminder, our billing for Workers AI started on April 1st for our non-beta models, while beta models remain free and unlimited. We offer 10,000 neurons per day for free to all customers. Workers Free customers will encounter a hard rate limit after 10,000 neurons in 24 hours while Workers Paid customers will incur usage at $0.011 per 1000 additional neurons. Read our Workers AI Pricing Developer Docs for the most up-to-date information on pricing.

New dashboard and playground

Lastly, we’ve revamped our Workers AI dashboard and AI playground. The Workers AI page in the Cloudflare dashboard now shows analytics for usage across models, including neuron calculations to help you better predict pricing. The AI playground lets you quickly test and compare different models and configure prompts and parameters. We hope these new tools help developers start building on Workers AI seamlessly – go try them out!

Run inference on GPUs in over 150 cities around the world

When we announced Workers AI back in September 2023, we set out to deploy GPUs to our data centers around the world. We plan to deliver on that promise and deploy inference-tuned GPUs almost everywhere by the end of 2024, making us the most widely distributed cloud-AI inference platform. We have over 150 cities with GPUs today and will continue to roll out more throughout the year.

We also have our next generation of compute servers with GPUs launching in Q2 2024, which means better performance, power efficiency, and improved reliability over previous generations. We provided a preview of our Gen 12 Compute servers design in a December 2023 blog post, with more details to come. With Gen 12 and future planned hardware launches, the next step is to support larger machine learning models and offer fine-tuning on our platform. This will allow us to achieve higher inference throughput, lower latency and greater availability for production workloads, as well as expanding support to new categories of workloads such as fine-tuning.

Hugging Face Partnership

We’re also excited to continue our partnership with Hugging Face in the spirit of bringing the best of open-source to our customers. Now, you can visit some of the most popular models on Hugging Face and easily click to run the model on Workers AI if it is available on our platform.

We’re happy to announce that we’ve added 4 more models to our platform in conjunction with Hugging Face. You can now access the new Mistral 7B v0.2 model with improved context windows, Nous Research’s Hermes 2 Pro fine-tuned version of Mistral 7B, Google’s Gemma 7B, and Starling-LM-7B-beta fine-tuned from OpenChat. There are currently 14 models that we’ve curated with Hugging Face to be available for serverless GPU inference powered by Cloudflare’s Workers AI platform, with more coming soon. These models are all served using Hugging Face’s technology with a TGI backend, and we work closely with the Hugging Face team to curate, optimize, and deploy these models.

“We are excited to work with Cloudflare to make AI more accessible to developers. Offering the most popular open models with a serverless API, powered by a global fleet of GPUs is an amazing proposition for the Hugging Face community, and I can’t wait to see what they build with it.”- Julien Chaumond, Co-founder and CTO, Hugging Face

You can find all of the open models supported in Workers AI in this Hugging Face Collection, and the “Deploy to Cloudflare Workers AI” button is at the top of each model card. To learn more, read Hugging Face’s blog post and take a look at our Developer Docs to get started. Have a model you want to see on Workers AI? Send us a message on Discord with your request.

Supporting fine-tuned inference - BYO LoRAs

Fine-tuned inference is one of our most requested features for Workers AI, and we’re one step closer now with Bring Your Own (BYO) LoRAs. Using the popular Low-Rank Adaptation method, researchers have figured out how to take a model and adapt some model parameters to the task at hand, rather than rewriting all model parameters like you would for a fully fine-tuned model. This means that you can get fine-tuned model outputs without the computational expense of fully fine-tuning a model.

We now support bringing trained LoRAs to Workers AI, where we apply the LoRA adapter to a base model at runtime to give you fine-tuned inference, at a fraction of the cost, size, and speed of a fully fine-tuned model. In the future, we want to be able to support fine-tuning jobs and fully fine-tuned models directly on our platform, but we’re excited to be one step closer today with LoRAs.

const response = await ai.run(
  "@cf/mistralai/mistral-7b-instruct-v0.2-lora", //the model supporting LoRAs
  {
      messages: [{"role": "user", "content": "Hello world"],
      raw: true, //skip applying the default chat template
      lora: "00000000-0000-0000-0000-000000000", //the finetune id OR name 
  }
);

BYO LoRAs is in open beta as of today for Gemma 2B and 7B, Llama 2 7B and Mistral 7B models with LoRA adapters up to 100MB in size and max rank of 8, and up to 30 total LoRAs per account. As always, we expect you to use Workers AI and our new BYO LoRA feature with our Terms of Service in mind, including any model-specific restrictions on use contained in the models’ license terms.

Read the technical deep dive blog post on fine-tuning with LoRA and developer docs to get started.

Write Workers in Python

Python is the second most popular programming language in the world (after JavaScript) and the language of choice for building AI applications. And starting today, in open beta, you can now write Cloudflare Workers in Python. Python Workers support all bindings to resources on Cloudflare, including Vectorize, D1, KV, R2 and more.

LangChain is the most popular framework for building LLM‑powered applications, and like how Workers AI works with langchain-js, the Python LangChain library works on Python Workers, as do other Python packages like FastAPI.

Workers written in Python are just as simple as Workers written in JavaScript:

from js import Response

async def on_fetch(request, env):
    return Response.new("Hello world!")

…and are configured by simply pointing at a .py file in your wrangler.toml:

name = "hello-world-python-worker"
main = "src/entry.py"
compatibility_date = "2024-03-18"
compatibility_flags = ["python_workers"]

There are no extra toolchain or precompilation steps needed. The Pyodide Python execution environment is provided for you, directly by the Workers runtime, mirroring how Workers written in JavaScript already work.

There’s lots more to dive into — take a look at the docs, and check out our companion blog post for details about how Python Workers work behind the scenes.

AI Gateway now supports Anthropic, Azure, AWS Bedrock, Google Vertex, and Perplexity

Our AI Gateway product helps developers better control and observe their AI applications, with analytics, caching, rate limiting, and more. We are continuing to add more providers to the product, including Anthropic, Google Vertex, and Perplexity, which we’re excited to announce today. We quietly rolled out Azure and Amazon Bedrock support in December 2023, which means that the most popular providers are now supported via AI Gateway, including Workers AI itself.

Take a look at our Developer Docs to get started with AI Gateway.

Coming soon: Persistent Logs

In Q2 of 2024, we will be adding persistent logs so that you can push your logs (including prompts and responses) to object storage, custom metadata so that you can tag requests with user IDs or other identifiers, and secrets management so that you can securely manage your application’s API keys.

We want AI Gateway to be the control plane for your AI applications, allowing developers to dynamically evaluate and route requests to different models and providers. With our persistent logs feature, we want to enable developers to use their logged data to fine-tune models in one click, eventually running the fine-tune job and the fine-tuned model directly on our Workers AI platform. AI Gateway is just one product in our AI toolkit, but we’re excited about the workflows and use cases it can unlock for developers building on our platform, and we hope you’re excited about it too.

Vectorize metadata filtering and future GA of million vector indexes

Vectorize is another component of our toolkit for AI applications. In open beta since September 2023, Vectorize allows developers to persist embeddings (vectors), like those generated from Workers AI text embedding models, and query for the closest match to support use cases like similarity search or recommendations. Without a vector database, model output is forgotten and can’t be recalled without extra costs to re-run a model.

Since Vectorize’s open beta, we’ve added metadata filtering. Metadata filtering lets developers combine vector search with filtering for arbitrary metadata, supporting the query complexity in AI applications. We’re laser-focused on getting Vectorize ready for general availability, with an target launch date of June 2024, which will include support for multi-million vector indexes.

// Insert vectors with metadata
const vectors: Array = [
  {
    id: "1",
    values: [32.4, 74.1, 3.2],
    metadata: { url: "/products/sku/13913913", streaming_platform: "netflix" }
  },
  {
    id: "2",
    values: [15.1, 19.2, 15.8],
    metadata: { url: "/products/sku/10148191", streaming_platform: "hbo" }
  },
...
];
let upserted = await env.YOUR_INDEX.upsert(vectors);

// Query with metadata filtering
let metadataMatches = await env.YOUR_INDEX.query(, { filter: { streaming_platform: "netflix" }} )

The most comprehensive Developer Platform to build AI applications

On Cloudflare’s Developer Platform, we believe that all developers should be able to quickly build and ship full-stack applications – and that includes AI experiences as well. With our GA of Workers AI, announcements for Python support in Workers, AI Gateway, and Vectorize, and our partnership with Hugging Face, we’ve expanded the world of possibilities for what you can build with AI on our platform. We hope you are as excited as we are – take a look at all our Developer Docs to get started, and let us know what you build.

Cloudflare named a leader in Forrester Edge Development Platforms Wave, Q4 2023

Brendan Irvine-Broque — Mon, 27 Nov 2023 18:08:50 GMT

Forrester has recognized Cloudflare as a leader in The Forrester Wave™: Edge Development Platforms, Q4 2023 with the top score in the current offering category.

According to the report by Principal Analyst, Devin Dickerson, “Cloudflare’s edge development platform provides the building blocks enterprises need to create full stack distributed applications and enables developers to take advantage of a globally distributed network of compute, storage and programmable security without being experts on CAP theorem.“

Over one million developers are building applications using the Developer Platform products including Workers, Pages, R2, KV, Queues, Durable Objects, D1, Stream, Images, and more. Developers can easily deploy highly distributed, full-stack applications using Cloudflare’s full suite of compute, storage, and developer services.

Workers make Cloudflare’s network programmable

“ A key strength of the platform is the interoperability with Cloudflare’s programmable global CDN combined with a deployment model that leverages intelligent workload placement.”– The Forrester Wave™: Edge Development Platforms, Q4 2023

Workers run across Cloudflare’s global network, provide APIs to read from and write directly to the local cache, and expose context from Cloudflare’s CDN directly on the request object that a Worker receives.

This close integration with Cloudflare’s network allows developers to build, protect, and connect globally distributed applications, without deploying to specific regions. Smart Placement optimizes Workers to run in the location that yields the fastest overall performance, whether it's the location closest to the data, or closest to the user. Hyperdrive automatically pools database connections, allowing Workers running all over the world to reuse them when querying PostgreSQL databases, avoiding the scaling challenges that make it hard to use traditional databases with a serverless architecture. Cron Triggers allow for up to 15 minutes of CPU time, allowing for compute intensive background work.

Cloudflare is beyond edge computing — it’s everywhere computing. We use our network to make your application perform best, shaped by real-world data and tailored to access patterns and programming paradigms.

Deploy distributed systems, without being a distributed systems expert

“ Reference customers consistently call out the ease of onboarding, which sees developers with no prior background delivering workloads across the globe in minutes, and production quality applications within a week. “– The Forrester Wave™: Edge Development Platforms, Q4 2023

Workers empower any developer to deploy globally distributed applications, without needing to become distributed systems experts or experts in configuring cloud infrastructure.

When you deploy a Worker, behind the scenes Cloudflare distributes it across the globe. But to you, it’s a single application that you can run and test locally, using the same open-source JavaScript runtime that your Workers run on in production.
When you deploy a Durable Object to coordinate real-time state, you’ve built a distributed application, but instead of having to learn RPC protocols and scale infrastructure, you’ve programmed the whole thing in JavaScript using web standard APIs that front-end developers know and rely on daily.
Enqueuing and processing batches of messages with Cloudflare Queues takes adding just a few more lines of JavaScript to an existing Worker.
When you create a web application with Cloudflare Pages, you’ve set up a complete continuous build and deployment pipeline with preview URLs, just by connecting to a GitHub repository.

Developers who previously only wrote front-end code are able to build the back end, and make their app real-time and reactive. Teams stuck waiting for infrastructure experts to provision resources are able to start prototyping today rather than next week. Writing and deploying a Worker is familiar and accessible, and this lets engineering teams move faster, with less overhead.

Why are teams able to get started so quickly?

Workers use web standard APIs that front-end developers and anyone building web applications already use every day. Cloudflare was a founding member of the Web-interoperable Runtimes Community Group (WinterCG) and is committed to interoperability across runtimes.
The tools developers already use every day are native to our platform. We publish TypeScript types for all APIs, and support compiling TypeScript when authoring and deploying via the Wrangler CLI or via the code editor in the Cloudflare dashboard — which itself is powered by the popular VSCode editor.
The open-source frameworks that developers prefer to build with are supported. A growing set of APIs from Node.js are available natively in the Workers runtime, allowing existing open source libraries to work on Workers. And increasingly, new open source projects that developers depend on are designed from day one to work across all WinterCG runtimes. Every day, more of the JavaScript ecosystem works on Workers.

Expanding into AI with GPUs, LLMs, and more

“Its superior vision refuses to limit the future footprint to the edge, and their purposeful approach to building out capabilities on the roadmap suggests that it will be increasingly well positioned to take on public cloud hyperscalers for workloads. “– The Forrester Wave™: Edge Development Platforms, Q4 2023

We are building a complete compute platform for production applications at scale. And as every company and every developer is now building or experimenting with AI, Cloudflare has made GPUs an integrated part of our developer platform. We’ve made it just as easy to get started with AI as we have to deliver a global workload. In mid-November, we hit our goal to have Workers AI Inference running in over 100 cities around the world, and by the end of the 2024 Workers AI will be running in nearly every city Cloudflare has a presence in.

Workers AI allows developers to build applications using the latest open-source AI models, without provisioning any infrastructure or paying for costly unused capacity. We’re extending this to support deploying models directly from Hugging Face to Workers AI, for an even wider set of AI models. And unlike provisioning a VM with a GPU in a specific data center, we’re building this such that we can treat our whole network as one giant compute resource, running models in the right place at the right time to serve developers’ needs.

Beyond model inference, we’re doubling down on supporting web standard APIs and making the WebGPU API available from within the Workers platform. While we’re proud to be recognized as a leading edge platform, we’re not just that —we are a platform for developing full-stack applications, even those that require compute power that just one year ago very few used or needed.

We’re excited to show you what’s next, including a new way to manage secrets across Cloudflare products, improved observability, and better tools for releasing changes. Every day we see more advanced applications built on our platform, and we’re committed to matching that with tools to serve the most mission-critical workloads — the same ones we use ourselves to build our products on our own platform.

Download the report here.

New Workers pricing — never pay to wait on I/O again

Rita Kozlov — Thu, 28 Sep 2023 13:00:03 GMT

Today we are announcing new pricing for Cloudflare Workers and Pages Functions, where you are billed based on CPU time, and never for the idle time that your Worker spends waiting on network requests and other I/O. Unlike other platforms, when you build applications on Workers, you only pay for the compute resources you actually use.

Why is this exciting? To date, all large serverless compute platforms have billed based on how long your function runs — its duration or “wall time”. This is a reflection of a new paradigm built on a leaky abstraction — your code may be neatly packaged up into a “function”, but under the hood there’s a virtual machine (VM). A VM can’t be paused and resumed quickly enough to execute another piece of code while it waits on I/O. So while a typical function might take 100ms to run, it might typically spend only 10ms doing CPU work, like crunching numbers or parsing JSON, with the rest of time spent waiting on I/O.

This status quo has meant that you are billed for this idle time, while nothing is happening.

With this announcement, Cloudflare is the first and only global serverless platform to offer standard pricing based on CPU time, rather than duration. We think you should only pay for the compute time you actually use, and that’s how we’re going to bill you going forward.

Old pricing — two pricing models, each with tradeoffs

New pricing — one simple and predictable pricing model

With the same generous Free plan

Unlike wall time (duration, or GB-s), CPU time is more predictable and under your control. When you make a request to a third party API, you can’t control how long that API takes to return a response. This time can be quite long, and vary dramatically — particularly when building AI applications that make inference requests to LLMs. If a request takes twice as long to complete, duration-based billing means you pay double. By contrast, CPU time is consistent and unaffected by time spent waiting on I/O — purely a function of the logic and processing of inputs on outputs to your Worker. It is entirely under your control.

Starting October 31, 2023, you will have the option to opt in individual Workers and Pages Functions projects on your account to new pricing, and newly created projects will default to new pricing. You’ll be able to estimate how much new pricing will cost in the Cloudflare dashboard. For the majority of current applications, new pricing is the same or less expensive than the previous Bundled and Unbound pricing plans.

If you’re on our Workers Paid plan, you will have until March 1, 2024 to switch to the new pricing on your own, after which all of your projects will be automatically migrated to new pricing. If you’re an Enterprise customer, any contract renewals after March 1, 2024, will use the new pricing. You’ll receive plenty of advance notice via email and dashboard notifications before any changes go into effect. And since CPU time is fully in your control, the more you optimize your Worker’s compute time, the less you’ll pay. Your incentives are aligned with ours, to make efficient use of compute resources on Region: Earth.

The challenge of truly scaling to zero

The beauty of serverless is that it allows teams to focus on what matters most — delivering value to their customers, rather than managing infrastructure. It saves you money by effortlessly scaling up and down all over the world based on your traffic, whether you’re an early stage startup or Shopify during Black Friday.

One of the promises of serverless is the idea of scaling to zero — once those big days subside, you no longer have to pay for virtual machines to sit idle before your autoscaling kicks in, or be charged by the hour for instances that you barely ended up using. No compute = no bills for usage. Or so, at least, is the promise of serverless.

Yet, there’s one hidden cost, where even in the serverless world you will find yourself paying for idle resources — what happens when your function is sitting around waiting on I/O? With pricing based on the duration that a function runs, you’re still billed for time that your service is doing zero work, and just waiting on network requests.

Most applications spend far more time waiting on this I/O than they do using the CPU, often ten times more.

Imagine a similar scenario in your own life — you grab a cab to go to the airport. On the way, the driver decides to stop to refuel and grab a snack, but leaves the meter running. This is not time spent bringing you closer to your destination, but it’s time that you’re paying for. Now imagine for the time the driver was refueling the car, the meter was paused. That’s the difference between CPU time and duration, or wall clock time.

But rather than waiting on the driver to refuel or grab a Snickers bar, what is it that you’re actually paying for when it comes to serverless compute?

Time spent waiting on services you don’t control

Most applications depend on one or many external service providers. Providers of hosted large language models (LLMs) like GPT-4 or Stable Diffusion. Databases as a service. Payment processors. Or simply an API request to a system outside your control. This is where software development is headed — rather than reinventing the wheel and slowly building everything themselves, both fast-moving startups and the Fortune 500 increasingly build using other services to avoid undifferentiated heavy lifting.

Every time an application interacts with one of these external services, it has to send data over the network and wait until it receives a response. And while some services are lightning fast, others can take considerable time, like waiting for a payment processor or for a large media file to be uploaded or converted. Your own application sits idle for most of the request, waiting on services outside your control.

Until today, you’ve had to pay while your application waits. You’ve had to pay more when a service you depend on has an operational issue and slows down, or times out in responding to your request. This has been a disincentive to incrementally move parts of your application to serverless.

Cloudflare’s new pricing: the first serverless platform to truly scale down to zero

The idea of “scale to zero” is that you never have to keep instances of your application sitting idle, waiting for something to happen. Serverless is more than just not having to manage servers or virtual machines — you shouldn’t have to provision and manage the number of compute resources that are available or warm.

Our new pricing takes the “scale to zero” concept even further, and extends it to whether your application is actually performing work. If you’re still paying while nothing is happening, we don’t think that’s truly scale to zero. Your application is idle. The CPU can be used for other tasks. Whether your application is “running” is an old concept lifted from an era before multi-tenant cloud platforms. What matters is if you are actually using compute resources.

Pay less, deploy everywhere, without hidden costs

Let’s compare what you’d pay on new Workers pricing to AWS Lambda, for the following Worker:

One billion requests per month
Seven CPU milliseconds per request
200ms duration per request

The above table is for informational purposes only. Prices are limited to the public fees as of September 20, 2023, and do not include taxes and any other fees. AWS Lambda and Lambda @ Edge prices are based on publicly available pricing in US-East (Ohio) region as published on https://aws.amazon.com/lambda/pricing/

Workers are the most cost-effective option, and are globally distributed, automatically optimized with Smart Placement, and integrated with Durable Objects, R2, KV, Cache, Queues, D1 and more. And with Workers, you never have to pay extra for provisioned concurrency, pay a penalty for streaming responses, or incur egregious egress fees.

New Workers pricing makes building AI applications dramatically cheaper

Yesterday we announced a new suite of products to let you build AI applications on Cloudflare — Workers AI, AI Gateway, and our new vector database, Vectorize.

Nearly everyone is building new products and features using AI models right now. Large language models and generative AI models are incredibly powerful. But they aren’t always fast — asking a model to create an image, transcribe a segment of audio, or write a story often takes multiple seconds — far longer than a typical API response or database query that we expect to return in tens of milliseconds. There is significant compute work going on behind the scenes, and that means longer duration per request to a Worker.

New Workers pricing makes this much less expensive than it was previously on the Unbound usage model.

Let’s take the same example as above, but instead assume the duration of the request is two seconds (2000ms), because the Worker makes an inference request to a large AI model. With new Workers pricing, you pay the exact same amount, no matter how long this request takes.

No surprise bills — set a maximum limit on CPU time for each Worker

Surprise bills from cloud providers are an unfortunately common horror story. In the old way of provisioning compute resources, forgetting to shut down an instance of a database or virtual machine can cost hundreds of dollars. And accidentally autoscaling up too high can be even worse.

We’re building new safeguards to prevent these kinds of scenarios on Workers. As part of new pricing, you will be able to cap CPU usage on a per-Worker basis.

For example, if you have a Worker with a p99 CPU time of 15ms, you might use this to set a max CPU limit of 40ms — enough headroom to ensure that your worker will run successfully, while ensuring that even if you ship a bug that causes a CPU time to ratchet up dramatically, or have an edge case that causes infinite recursion, you can’t suddenly rack up a giant unexpected bill, or be vulnerable to a denial of wallet attack. This can be particularly helpful if your worker handles variable or user-generated input, to guard against edge cases that you haven’t accounted for.

Alternatively, if you’re running a production service, but want to make sure you stay on top of your costs, we will also be adding the option to configure notifications that can automatically email you, page you, or send a webhook if your worker exceeds a particular amount of CPU time per request. You will be able to choose at what threshold you want to be notified, and how.

New ways to “hibernate” Durable Objects while keeping connections alive

While Workers are stateless functions, Durable Objects are stateful and long-lived, commonly used to coordinate and persist real-time state in chat, multiplayer games, or collaborative apps. And unlike Workers, duration-based pricing fits Durable Objects well. As long as one or more clients are connected to a Durable Object, it keeps state available in memory. Durable Objects pricing will remain duration-based, and is not changing as part of this announcement.

What about when a client is connected to a Durable Object, but no work has happened for a long time? Consider a collaborative whiteboard app built using Durable Objects. A user of the app opens the app in a browser tab, but then forgets about it, and leaves it running for days, with an open WebSocket connection. Just like with Workers, we don’t think you should have to pay for this idle time. But until recently, there hasn’t been an API to signal to us that a Durable Object can be safely “hibernated”.

The recently introduced Hibernation API, currently in beta, allows you to set an automatic response to be used while hibernated and serialize state such that it survives hibernation. This gives Cloudflare the inputs we need in order to maintain open WebSocket connections from clients, while “hibernating” the Durable Object such that it is not actively running, and you are not billed for idle time. The result is that your state is always available in-memory when actually need it, but isn’t unnecessarily kept around when it’s not. As long as your Durable Object is hibernating, even if there are active clients still connected over a WebSocket, you won’t be billed for duration.

Snippets make Cloudflare’s CDN programmable — for free

What if you just want to modify a header, do a country code redirect, or cache a custom query? Developers have relied on Workers to program Cloudflare’s CDN like this for many years. With the announcement of Cloudflare Snippets last year, now in alpha, we’re making it free.

If you use Workers today for these smaller use cases, to customize any of Cloudflare’s application services, Snippets will be the optimal, zero cost option.

A serverless platform without limits

Developers are building ever larger and more complex full-stack applications on Workers each month. Our promise to you is to help you scale in any direction, without worrying about paying for idle time or having to manage and provision compute resources across regions.

This also means not having to worry about limits. Workers already serves many millions of requests per second, and scales and performs so well that we are rebuilding our own CDN on top of Workers. Individual Workers can now be up to 10MB, with a max startup time of 400ms, and can be easily composed together using Service Bindings. Entire platforms are built on top of Workers, with a growing number of companies allowing their own customers to write and deploy custom code and applications via Workers for Platforms. Some of the biggest platforms in the world rely on Cloudflare and the Workers platform during the most critical moments.

New pricing removes limits on the types of applications that could be built cost effectively with duration-based pricing. It removes the ceiling on CPU time from our original request-based pricing. We’re excited to see what you build, and are committed to being the development platform where you’re not constrained by limits on scale, regions, instances, concurrency or whatever else you need to handle to grow and operate globally.

When will new pricing be available?

Starting October 31, 2023, you will have the option to opt in individual Workers and Pages Functions projects on your account to new pricing, and newly created projects will default to new pricing. You will have until March 1, 2024, or the end of your Enterprise contract, whichever comes later, to switch to new pricing on your own, after which all of your projects will be automatically migrated to new pricing. You’ll receive plenty of advance notice via email and dashboard notifications before any changes go into effect.

Between now and then, we want to hear from you. We’ve based new pricing off feedback we’ve heard from developers building serverless applications, and companies estimating and projecting their costs. Tell us what you think of new pricing by sharing your feedback in this survey. We read every response.

Announcing connect() — a new API for creating TCP sockets from Cloudflare Workers

Brendan Irvine-Broque — Tue, 16 May 2023 13:00:13 GMT

Today, we are excited to announce a new API in Cloudflare Workers for creating outbound TCP sockets, making it possible to connect directly to any TCP-based service from Workers.

Standard protocols including SSH, MQTT, SMTP, FTP, and IRC are all built on top of TCP. Most importantly, nearly all applications need to connect to databases, and most databases speak TCP. And while Cloudflare D1 works seamlessly on Workers, and some hosted database providers allow connections over HTTP or WebSockets, the vast majority of databases, both relational (SQL) and document-oriented (NoSQL), require clients to connect by opening a direct TCP “socket”, an ongoing two-way connection that is used to send queries and receive data. Now, Workers provides an API for this, the first of many steps to come in allowing you to use any database or infrastructure you choose when building full-stack applications on Workers.

Database drivers, the client code used to connect to databases and execute queries, are already using this new API. pg, the most widely used JavaScript database driver for PostgreSQL, works on Cloudflare Workers today, with more database drivers to come.

The TCP Socket API is available today to everyone. Get started by reading the TCP Socket API docs, or connect directly to any PostgreSQL database from your Worker by following this guide.

First — what is a TCP Socket?

TCP (Transmission Control Protocol) is a foundational networking protocol of the Internet. It is the underlying protocol that is used to make HTTP requests (prior to HTTP/3, which uses QUIC), to send email over SMTP, to query databases using database–specific protocols like MySQL, and many other application-layer protocols.

A TCP socket is a programming interface that represents a two-way communication connection between two applications that have both agreed to “speak” over TCP. One application (ex: a Cloudflare Worker) initiates an outbound TCP connection to another (ex: a database server) that is listening for inbound TCP connections. Connections are established by negotiating a three-way handshake, and after the handshake is complete, data can be sent bi-directionally.

A socket is the programming interface for a single TCP connection — it has both a readable and writable “stream” of data, allowing applications to read and write data on an ongoing basis, as long as the connection remains open.

connect() — A simpler socket API

With Workers, we aim to support standard APIs that are supported across browsers and non-browser environments wherever possible, so that as many NPM packages as possible work on Workers without changes, and package authors don’t have to write runtime-specific code. But for TCP sockets, we faced a challenge — there was no clear shared standard across runtimes. Node.js provides the net and tls APIs, but Deno implements a different API — Deno.connect. And web browsers do not provide a raw TCP socket API, though a WICG proposal does exist, and it is different from both Node.js and Deno.

We also considered how a TCP socket API could be designed to maximize performance and ergonomics in a serverless environment. Most networking APIs were designed well before serverless emerged, with the assumption that the developer’s application is also the server, responsible for directly handling configuring TLS options and credentials.

With this backdrop, we reached out to the community, with a focus on maintainers of database drivers, ORMs and other libraries that create outbound TCP connections. Using this feedback, we’ve tried to incorporate the best elements of existing APIs and proposals, and intend to contribute back to future standards, as part of the Web-interoperable Runtimes Community Group (WinterCG).

The API we landed on is a simple function, connect(), imported from the new cloudflare:sockets module, that returns an instance of a Socket. Here’s a simple example showing it used to connect to a Gopher server. Gopher was one of the Internet’s early protocols that relied on TCP/IP, and still works today:

import { connect } from 'cloudflare:sockets';

export default {
  async fetch(req: Request) {
    const gopherAddr = "gopher.floodgap.com:70";
    const url = new URL(req.url);

    try {
      const socket = connect(gopherAddr);

      const writer = socket.writable.getWriter()
      const encoder = new TextEncoder();
      const encoded = encoder.encode(url.pathname + "\r\n");
      await writer.write(encoded);

      return new Response(socket.readable, { headers: { "Content-Type": "text/plain" } });
    } catch (error) {
      return new Response("Socket connection failed: " + error, { status: 500 });
    }
  }
};

We think this API design has many benefits that can be realized not just on Cloudflare, but in any serverless environment that adopts this design:

connect(address: SocketAddress | string, options?: SocketOptions): Socket

declare interface Socket {
  get readable(): ReadableStream;
  get writable(): WritableStream;
  get closed(): Promise;
  close(): Promise;
  startTls(): Socket;
}

declare interface SocketOptions {
  secureTransport?: string;
  allowHalfOpen: boolean;
}

declare interface SocketAddress {
  hostname: string;
  port: number;
}

Opportunistic TLS (StartTLS), without separate APIs

Opportunistic TLS, a pattern of creating an initial insecure connection, and then upgrading it to a secure one that uses TLS, remains common, particularly with database drivers. In Node.js, you must use the net API to create the initial connection, and then use the tls API to create a new, upgraded connection. In Deno, you pass the original socket to Deno.startTls(), which creates a new, upgraded connection.

Drawing on a previous W3C proposal for a TCP Socket API, we’ve simplified this by providing one API, that allows TLS to be enabled, allowed, or used when creating a socket, and exposes a simple method, startTls(), for upgrading a socket to use TLS.

// Create a new socket without TLS. secureTransport defaults to "off" if not specified.
const socket = connect("address:port", { secureTransport: "off" })

// Create a new socket, then upgrade it to use TLS.
// Once startTls() is called, only the newly created socket can be used.
const socket = connect("address:port", { secureTransport: "starttls" })
const secureSocket = socket.startTls();

// Create a new socket with TLS
const socket = connect("address:port", { secureTransport: "use" })

TLS configuration — a concern of host infrastructure, not application code

Existing APIs for creating TCP sockets treat TLS as a library that you interact with in your application code. The tls.createSecureContext() API from Node.js has a plethora of advanced configuration options that are mostly environment specific. If you use custom certificates when connecting to a particular service, you likely use a different set of credentials and options in production, staging and development. Managing direct file paths to credentials across environments and swapping out .env files in production build steps are common pain points.

Host infrastructure is best positioned to manage this on your behalf, and similar to Workers support for making subrequests using mTLS, TLS configuration and credentials for the socket API will be managed via Wrangler, and a connect() function provided via a capability binding. Currently, custom TLS credentials and configuration are not supported, but are coming soon.

Start writing data immediately, before the TLS handshake finishes

Because the connect() API synchronously returns a new socket, one can start writing to the socket immediately, without waiting for the TCP handshake to first complete. This means that once the handshake completes, data is already available to send immediately, and host platforms can make use of pipelining to optimize performance.

connect() API + DB drivers = Connect directly to databases

Many serverless databases already work on Workers, allowing clients to connect over HTTP or over WebSockets. But most databases don’t “speak” HTTP, including databases hosted on most cloud providers.

Databases each have their own “wire protocol”, and open-source database “drivers” that speak this protocol, sending and receiving data over a TCP socket. Developers rely on these drivers in their own code, as do database ORMs. Our goal is to make sure that you can use the same drivers and ORMs you might use in other runtimes and on other platforms on Workers.

Try it now — connect to PostgreSQL from Workers

We’ve worked with the maintainers of pg, one of the most popular database drivers in the JavaScript ecosystem, used by ORMs including Sequelize and knex.js, to add support for connect().

You can try this right now. First, create a new Worker and install pg:

wrangler init
npm install --save pg

As of this writing, you’ll need to enable the node_compat option in wrangler.toml:

wrangler.toml

name = "my-worker"
main = "src/index.ts"
compatibility_date = "2023-05-15"
node_compat = true

In just 20 lines of TypeScript, you can create a connection to a Postgres database, execute a query, return results in the response, and close the connection:

index.ts

import { Client } from "pg";

export interface Env {
  DB: string;
}

export default {
  async fetch(
    request: Request,
    env: Env,
    ctx: ExecutionContext
  ): Promise {
    const client = new Client(env.DB);
    await client.connect();
    const result = await client.query({
      text: "SELECT * from customers",
    });
    console.log(JSON.stringify(result.rows));
    const resp = Response.json(result.rows);
    // Close the database connection, but don't block returning the response
    ctx.waitUntil(client.end());
    return resp;
  },
};

To test this in local development, use the --experimental-local flag (instead of –local), which uses the open-source Workers runtime, ensuring that what you see locally mirrors behavior in production:

wrangler dev --experimental-local

What’s next for connecting to databases from Workers?

This is only the beginning. We’re aiming for the two popular MySQL drivers, mysql and mysql2, to work on Workers soon, with more to follow. If you work on a database driver or ORM, we’d love to help make your library work on Workers.

If you’ve worked more closely with database scaling and performance, you might have noticed that in the example above, a new connection is created for every request. This is one of the biggest current challenges of connecting to databases from serverless functions, across all platforms. With typical client connection pooling, you maintain a local pool of database connections that remain open. This approach of storing a reference to a connection or connection pool in global scope will not work, and is a poor fit for serverless. Managing individual pools of client connections on a per-isolate basis creates other headaches — when and how should connections be terminated? How can you limit the total number of concurrent connections across many isolates and locations?

Instead, we’re already working on simpler approaches to connection pooling for the most popular databases. We see a path to a future where you don’t have to think about or manage client connection pooling on your own. We’re also working on a brand new approach to making your database reads lightning fast.

What’s next for sockets on Workers?

Supporting outbound TCP connections is only one half of the story — we plan to support inbound TCP and UDP connections, as well as new emerging application protocols based on QUIC, so that you can build applications beyond HTTP with Socket Workers.

Earlier today we also announced Smart Placement, which improves performance by placing any Worker that makes multiple HTTP requests to an origin run as close as possible to reduce round-trip time. We’re working on making this work with Workers that open TCP connections, so that if your Worker connects to a database in Virginia and makes many queries over a TCP connection, each query is lightning fast and comes from the nearest location on Cloudflare’s global network.

We also plan to support custom certificates and other TLS configuration options in the coming months — tell us what is a must-have in order to connect to the services you need to connect to from Workers.

Get started, and share your feedback

The TCP Socket API is available today to everyone. Get started by reading the TCP Socket API docs, or connect directly to any PostgreSQL database from your Worker by following this guide.

We want to hear your feedback, what you’d like to see next, and more about what you’re building. Join the Cloudflare Developers Discord.

Watch on Cloudflare TV

Bringing the best live video experience to Cloudflare Stream with AV1

Renan Dincer — Wed, 05 Oct 2022 17:08:00 GMT

Consumer hardware is pushing the limits of consumers’ bandwidth.

VR headsets support 5760 x 3840 resolution — 22.1 million pixels per frame of video. Nearly all new TVs and smartphones sold today now support 4K — 8.8 million pixels per frame. It’s now normal for most people on a subway to be casually streaming video on their phone, even as they pass through a tunnel. People expect all of this to just work, and get frustrated when it doesn’t.

Consumer Internet bandwidth hasn’t kept up. Even advanced mobile carriers still limit streaming video resolution to prevent network congestion. Many mobile users still have to monitor and limit their mobile data usage. Higher Internet speeds require expensive infrastructure upgrades, and 30% of Americans still say they often have problems simply connecting to the Internet at home.

We talk to developers every day who are pushing up against these limits, trying to deliver the highest quality streaming video without buffering or jitter, challenged by viewers’ expectations and bandwidth. Developers building live video experiences hit these limits the hardest — buffering doesn’t just delay video playback, it can cause the viewer to get out of sync with the live event. Buffering can cause a sports fan to miss a key moment as playback suddenly skips ahead, or find out in a text message about the outcome of the final play, before they’ve had a chance to watch.

Today we’re announcing a big step towards breaking the ceiling of these limits — support in Cloudflare Stream for the AV1 codec for live videos and their recordings, available today to all Cloudflare Stream customers in open beta. Read the docs to get started, or watch an AV1 video from Cloudflare Stream in your web browser. AV1 is an open and royalty-free video codec that uses 46% less bandwidth than H.264, the most commonly used video codec on the web today.

What is AV1, and how does it improve live video streaming?

Every piece of information that travels across the Internet, from web pages to photos, requires data to be transmitted between two computers. A single character usually takes one byte, so a two-page letter would be 3600 bytes or 3.6 kilobytes of data transferred.

One pixel in a photo takes 3 bytes, one each for red, green and blue in the pixel. A 4K photo would take 8,294,400 bytes, or 8.2 Megabytes. A video is like a photo that changes 30 times a second, which would make almost 15 Gigabytes per minute. That’s a lot!

To reduce the amount of bandwidth needed to stream video, before video is sent to your device, it is compressed using a codec. When your device receives video, it decodes this into the pixels displayed on your screen. These codecs are essential to both streaming and storing video.

Video compression codecs combine multiple advanced techniques, and are able to compress video to one percent of the original size, with your eyes barely noticing a difference. This also makes video codecs computationally intensive and hard to run. Smartphones, laptops and TVs have specific media decoding hardware, separate from the main CPU, optimized to decode specific protocols quickly, using the minimum amount of battery life and power.

Every few years, as researchers invent more efficient compression techniques, standards bodies release new codecs that take advantage of these improvements. Each generation of improvements in compression technology increases the requirements for computers that run them. With higher requirements, new chips are made available with increased compute capacity. These new chips allow your device to display higher quality video while using less bandwidth.

AV1 takes advantage of recent advances in compute to deliver video with dramatically fewer bytes, even compared to other relatively recent video protocols like VP9 and HEVC.

AV1 leverages the power of new smartphone chips

One of the biggest developments of the past few years has been the rise of custom chip designs for smartphones. Much of what’s driven the development of these chips is the need for advanced on-device image and video processing, as companies compete on the basis of which smartphone has the best camera.

This means the phones we carry around have an incredible amount of compute power. One way to think about AV1 is that it shifts work from the network to the viewer’s device. AV1 is fewer bytes over the wire, but computationally harder to decode than prior formats. When AV1 was first announced in 2018, it was dismissed by some as too slow to encode and decode, but smartphone chips have become radically faster in the past four years, more quickly than many saw coming.

AV1 hardware decoding is already built into the latest Google Pixel smartphones as part of the Tensor chip design. The Samsung Exynos 2200 and MediaTek Dimensity 1000 SoC mobile chipsets both support hardware accelerated AV1 decoding. It appears that Google will require that all devices that support Android 14 support decoding AV1. And AVPlayer, the media playback API built into iOS and tvOS, now includes an option for AV1, which hints at future support. It’s clear that the industry is heading towards hardware-accelerated AV1 decoding in the most popular consumer devices.

With hardware decoding comes battery life savings — essential for both today’s smartphones and tomorrow’s VR headsets. For example, a Google Pixel 6 with AV1 hardware decoding uses only minimal battery and CPU to decode and play our test video:

AV1 encoding requires even more compute power

Just as decoding is significantly harder for end-user devices, it is also significantly harder to encode video using AV1. When AV1 was announced in 2018, many doubted whether hardware would be able to encode it efficiently enough for the protocol to be adopted quickly enough.

To demonstrate this, we encoded the 4K rendering of Big Buck Bunny (a classic among video engineers!) into AV1, using an AMD EPYC 7642 48-Core Processor with 256 GB RAM. This CPU continues to be a workhorse of our compute fleet, as we have written about previously. We used the following command to re-encode the video, based on the example in the ffmpeg AV1 documentation:

ffmpeg -i bbb_sunflower_2160p_30fps_normal.mp4 -c:v libaom-av1 -crf 30 -b:v 0 -strict -2 av1_test.mkv

Using a single core, encoding just two seconds of video at 30fps took over 30 minutes. Even if all 48 cores were used to encode, it would take at minimum over 43 seconds to encode just two seconds of video. Live encoding using only CPUs would require over 20 servers running at full capacity.

Special-purpose AV1 software encoders like rav1e and SVT-AV1 that run on general purpose CPUs can encode somewhat faster than libaom-av1 with ffmpeg, but still consume a huge amount of compute power to encode AV1 in real-time, requiring multiple servers running at full capacity in many scenarios.

Cloudflare Stream encodes your video to AV1 in real-time

At Cloudflare, we control both the hardware and software on our network. So to solve the CPU constraint, we’ve installed dedicated AV1 hardware encoders, designed specifically to encode AV1 at blazing fast speeds. This end to end control is what lets us encode your video to AV1 in real-time. This is entirely out of reach to most public cloud customers, including the video infrastructure providers who depend on them for compute power.

Encoding in real-time means you can use AV1 for live video streaming, where saving bandwidth matters most. With a pre-recorded video, the client video player can fetch future segments of video well in advance, relying on a buffer that can be many tens of seconds long. With live video, buffering is constrained by latency — it’s not possible to build up a large buffer when viewing a live stream. There is less margin for error with live streaming, and every byte saved means that if a viewer’s connection is interrupted, it takes less time to recover before the buffer is empty.

Stream lets you support AV1 with no additional work

AV1 has a chicken or the egg dilemma. And we’re helping solve it.

Companies with large video libraries often re-encode their entire content library to a new codec before using it. But AV1 is so computationally intensive that re-encoding whole libraries has been cost prohibitive. Companies have to choose specific videos to re-encode, and guess which content will be most viewed ahead of time. This is particularly challenging for apps with user generated content, where content can suddenly go viral, and viewer patterns are hard to anticipate.

This has slowed down the adoption of AV1 — content providers wait for more devices to support AV1, and device manufacturers wait for more content to use AV1. Which will come first?

With Cloudflare Stream there is no need to manually trigger re-encoding, re-upload video, or manage the bulk encoding of a large video library. This is a unique approach that is made possible by integrating encoding and delivery into a single product — it is not possible to encode on-demand using the old way of encoding first, and then pointing a CDN at a bucket of pre-encoded files.

We think this approach can accelerate the adoption of AV1. Consider a video app with millions of minutes of user-generated video. Most videos will never be watched again. In the old model, developers would have to spend huge sums of money to encode upfront, or pick and choose which videos to re-encode. With Stream, we can help anyone incrementally adopt AV1, without re-encoding upfront. As we work towards making AV1 Generally Available, we’ll be working to make supporting AV1 simple and painless, even for videos already uploaded to Stream, with no special configuration necessary.

Open, royalty-free, and widely supported

At Cloudflare, we are committed to open standards and fighting patent trolls. While there are multiple competing options for new video codecs, we chose to support AV1 first in part because it is open source and has royalty-free licensing.

Other encoding codecs force device manufacturers to pay royalty fees in order to adopt their standard in consumer hardware, and have been quick to file lawsuits against competing video codecs. The group behind the open and royalty-free VP8 and VP9 codecs have been pushing back against this model for more than a decade, and AV1 is the successor to these codecs, with support from all the biggest technology companies, both software and hardware. Beyond its technical accomplishments, AV1 is a clear message from the industry that the future of video encoding should be open, royalty-free, and free from patent litigation.

Try AV1 right now with your live stream or live recording

Support for AV1 is currently in open beta. You can try using AV1 on your own live video with Cloudflare Stream right now — just add the ?betaCodecSuggestion=av1 query parameter to the HLS or DASH manifest URL for any live stream or live recording created after October 1st in Cloudflare Stream. Read the docs to get started. If you don’t yet have a Cloudflare account, you can sign up here and start using Cloudflare Stream in just a few minutes.

We also have a recording of a live video, encoded using AV1, that you can watch here. Note that Safari does not yet support AV1.

We encourage you to try AV1 with your test streams, and we’d love your feedback. Join our Discord channel and tell us what you’re building, and what kinds of video you’re interested in using AV1 with. We’d love to hear from you!

WebRTC live streaming to unlimited viewers, with sub-second latency

Kyle Boutette — Tue, 27 Sep 2022 13:00:00 GMT

Creators and broadcasters expect to be able to go live from anywhere, on any device. Viewers expect “live” to mean “real-time”. The protocols that power most live streams are unable to meet these growing expectations.

In talking to developers building live streaming into their apps and websites, we’ve heard near universal frustration with the limitations of existing live streaming technologies. Developers in 2022 rightly expect to be able to deliver low latency to viewers, broadcast reliably, and use web standards rather than old protocols that date back to the era of Flash.

Today, we’re excited to announce in open beta that Cloudflare Stream now supports live video streaming over WebRTC, with sub-second latency, to unlimited concurrent viewers. This is a new feature of Cloudflare Stream, and you can start using it right now in the Cloudflare Dashboard — read the docs to get started.

WebRTC with Cloudflare Stream leapfrogs existing tools and protocols, exclusively uses open standards with zero dependency on a specific SDK, and empowers any developer to build both low latency live streaming and playback into their website or app.

The status quo of streaming live video is broken

The status quo of streaming live video has high latency, depends on archaic protocols and is incompatible with the way developers build apps and websites. A reasonable person’s expectations of what the Internet should be able to deliver in 2022 are simply unmet by the dominant set of protocols carried over from past eras.

Viewers increasingly expect “live” to mean “real-time”. People want to place bets on sports broadcasts in real-time, interact and ask questions to presenters in real-time, and never feel behind their friends at a live event.

In practice, the HLS and DASH standards used to deliver video have 10+ seconds of latency. LL-HLS and LL-DASH bring this down to closer to 5 seconds, but only as a hack on top of the existing protocol that delivers segments of video in individual HTTP requests. Sending mini video clips over TCP simply cannot deliver video in real-time. HLS and DASH are here to stay, but aren’t the future of real-time live video.

Creators and broadcasters expect to be able to go live from anywhere, on any device.

In practice, people creating live content are stuck with a limited set of native apps, and can’t go live using RTMP from a web browser. Because it’s built on top of TCP, the RTMP broadcasting protocol struggles under even the slightest network disruption, making it a poor or often unworkable option when broadcasting from mobile networks. RTMP, originally built for use with Adobe Flash Player, was last updated in 2012, and while Stream supports the newer SRT protocol, creators need an option that works natively on the web and can more easily be integrated in native apps.

Developers expect to be able to build using standard APIs that are built into web browsers and native apps.

In practice, RTMP can’t be used from a web browser, and creating a native app that supports RTMP broadcasting typically requires diving into lower-level programming languages like C and Rust. Only those with expertise in both live video protocols and these languages have full access to the tools needed to create novel live streaming client applications.

We’re solving this by using new open WebRTC standards: WHIP and WHEP

WebRTC is the real-time communications protocol, supported across all web browsers, that powers video calling services like Zoom and Google Meet. Since inception it’s been designed for real-time, ultra low-latency communications.

While WebRTC is well established, for most of its history it’s lacked standards for:

Ingestion — how broadcasters should send media content (akin to RTMP today)
Egress — how viewers request and receive media content (akin to DASH or HLS today)

As a result, developers have had to implement this on their own, and client applications on both sides are often tightly coupled to provider-specific implementations. Developers we talk to often express frustration, having sunk months of engineering work into building around a specific vendor’s SDK, unable to switch without a significant rewrite of their client apps.

At Cloudflare, our mission is broader — we’re helping to build a better Internet. Today we’re launching not just a new feature of Cloudflare Stream, but a vote of confidence in new WebRTC standards for both ingestion and egress. We think you should be able to start using Stream without feeling locked into an SDK or implementation specific to Cloudflare, and we’re committed to using open standards whenever possible.

For ingestion, WHIP is an IETF draft on the Standards Track, with many applications already successfully using it in production. For delivery (egress), WHEP is an IETF draft with broad agreement. Combined, they provide a standardized end-to-end way to broadcast one-to-many over WebRTC at scale.

Cloudflare Stream is the first cloud service to let you both broadcast using WHIP and playback using WHEP — no vendor-specific SDK needed. Here’s how it works:

Cloudflare Stream is already built on top of the Cloudflare developer platform, using Workers and Durable Objects running on Cloudflare’s global network, within 50ms of 95% of the world’s Internet-connected population.

Our WebRTC implementation extends this to relay WebRTC video through our network. Broadcasters stream video using WHIP to the point of presence closest to their location, which tells the Durable Object where the live stream can be found. Viewers request streaming video from the point of presence closest to them, which asks the Durable Object where to find the stream, and video is routed through Cloudflare’s network, all with sub-second latency.

Using Durable Objects, we achieve this with zero centralized state. And just like the rest of Cloudflare Stream, you never have to think about regions, both in terms of pricing and product development.

While existing ultra low-latency streaming providers charge significantly more to stream over WebRTC, because Stream runs on Cloudflare’s global network, we’re able to offer WebRTC streaming at the same price as delivering video over HLS or DASH. We don’t think you should be penalized with higher pricing when choosing which technology to rely on to stream live video. Once generally available, WebRTC streaming will cost $1 per 1000 minutes of video delivered, just like the rest of Stream.

What does sub-second latency let you build?

Ultra low latency unlocks interactivity within your website or app, removing the time delay between creators, in-person attendees, and those watching remotely.

Developers we talk to are building everything from live sports betting, to live auctions, to live viewer Q&A and even real-time collaboration in video post-production. Even streams without in-app interactivity can benefit from real-time — no sports fan wants to get a text from their friend at the game that ruins the moment, before they’ve had a chance to watch the final play. Whether you’re bringing an existing app or have a new idea in mind, we’re excited to see what you build.

If you can write JavaScript, you can let your users go live from their browser

While hobbyist and professional creators might take the time to download and learn how to use an application like OBS Studio, most Internet users won’t get past this friction of new tools, and copying RTMP keys from one tool to another. To empower more people to go live, they need to be able to broadcast from within your website or app, just by enabling access to the camera and microphone.

Cloudflare Stream with WebRTC lets you build live streaming into your app as a front-end developer, without any special knowledge of video protocols. And our approach, using the WHIP and WHEP open standards, means you can do this with zero dependencies, with 100% your code that you control.

Go live from a web browser with just a few lines of code

You can go live right now, from your web browser, by creating a live input in the Cloudflare Stream dashboard, and pasting a URL into the example linked below.

Read the docs or run the example code below in your browser using Stackblitz.

import WHIPClient from "./WHIPClient.js";

const url = "";
const videoElement = document.getElementById("input-video");
const client = new WHIPClient(url, videoElement);

This example uses an example WHIP client, written in just 100 lines of Javascript, using APIs that are native to web browsers, with zero dependencies. But because WHIP is an open standard, you can use any WHIP client you choose. Support for WHIP is growing across the video streaming industry — it has recently been added to Gstreamer, and one of the authors of the WHIP specification has written a Javascript client implementation. We intend to support the full WHIP specification, including supporting Trickle ICE for fast NAT traversal.

Play a live stream in a browser, with sub-second latency, no SDK required

Once you’ve started streaming, copy the playback URL from the live input you just created, and paste it into the example linked below.

Read the docs or run the example code below in your browser using Stackbltiz.

import WHEPClient from './WHEPClient.js';
const url = "";
const videoElement = document.getElementById("playback");
const client = new WHEPClient(url, videoElement);

Just like the WHIP example before, this one uses an example WHEP client we’ve written that has zero dependencies. WHEP is an earlier IETF draft than WHIP, published in July of this year, but adoption is moving quickly. People in the community have already written open-source client implementations in both Javascript, C, with more to come.

Start experimenting with real-time live video, in open beta today

WebRTC streaming is in open beta today, ready for you to use as an integrated feature of Cloudflare Stream. Once Generally Available, WebRTC streaming will be priced like the rest of Cloudflare Stream, based on minutes of video delivered and minutes stored.

Read the docs to get started.

Stream Live is now Generally Available

Brendan Irvine-Broque — Wed, 21 Sep 2022 13:15:00 GMT

Today, we’re excited to announce that Stream Live is out of beta, available to everyone, and ready for production traffic at scale. Stream Live is a feature of Cloudflare Stream that allows developers to build live video features in websites and native apps.

Since its beta launch, developers have used Stream to broadcast live concerts from some of the world’s most popular artists directly to fans, build brand-new video creator platforms, operate a global 24/7 live OTT service, and more. While in beta, Stream has ingested millions of minutes of live video and delivered to viewers all over the world.

Bring your big live events, ambitious new video subscription service, or the next mobile video app with millions of users — we’re ready for it.

Streaming live video at scale is hard

Live video uses a massive amount of bandwidth. For example, a one-hour live stream at 1080p at 8Mbps is 3.6GB. At typical cloud provider egress prices, even a little egress can break the bank.

Live video must be encoded on-the-fly, in real-time. People expect to be able to watch live video on their phone, while connected to mobile networks with less bandwidth, higher latency and frequently interrupted connections. To support this, live video must be re-encoded in real-time into multiple resolutions, allowing someone’s phone to drop to a lower resolution and continue playback. This can be complex (Which bitrates? Which codecs? How many?) and costly: running a fleet of virtual machines doesn’t come cheap.

Ingest location matters — Live streaming protocols like RTMPS send video over TCP. If a single packet is dropped or lost, the entire connection is brought to a halt while the packet is found and re-transmitted. This is known as “head of line blocking”. The further away the broadcaster is from the ingest server, the more network hops, and the more likely packets will be dropped or lost, ultimately resulting in latency and buffering for viewers.

Delivery location matters — Live video must be cached and served from points of presence as close to viewers as possible. The longer the network round trips, the more likely videos will buffer or drop to a lower quality.

Broadcasting protocols are in flux — The most widely used protocol for streaming live video, RTMPS, has been abandoned since 2012, and dates back to the era of Flash video in the early 2000s. A new emerging standard, SRT, is not yet supported everywhere. And WebRTC has only recently evolved into an option for high definition one-to-many broadcasting at scale.

The old way to solve this has been to stitch together separate cloud services from different vendors. One vendor provides excellent content delivery, but no encoding. Another provides APIs or hardware to encode, but leaves you to fend for yourself and build your own storage layer. As a developer, you have to learn, manage and write a layer of glue code around the esoteric details of video streaming protocols, codecs, encoding settings and delivery pipelines.

We built Stream Live to make streaming live video easy, like adding an tag to a website. Live video is now a fundamental building block of Internet content, and we think any developer should have the tools to add it to their website or native app.

With Stream, you or your users stream live video directly to Cloudflare, and Cloudflare delivers video directly to your viewers. You never have to manage internal encoding, storage, or delivery systems — it’s just live video in and live video out.

Our network, our hardware = a solution only Cloudflare can provide

We’re not the only ones building APIs for live video — but we are the only ones with our own global network and hardware that we control and optimize for video. That lets us do things that others can’t, like sub-second glass-to-glass latency using RTMPS and SRT playback at scale.

Newer video codecs require specialized hardware encoders, and while others are beholden to the hardware limitations of public cloud providers, we’re already hard at work installing the latest encoding hardware in our own racks, so that you can deliver high resolution video with even less bandwidth. Our goal is to make what is otherwise only available to video giants available directly to you — stay tuned for some exciting updates on this next week.

Most providers limit how many destinations you can restream or simulcast to. Because we operate our own network, we’ve never even worried about this, and let you restream to as many destinations as you need.

Operating our own network lets us price Stream based on minutes of video delivered — unlike others, we don’t pay someone else for bandwidth and then pass along their costs to you at a markup. The status quo of charging for bandwidth or per-GB storage penalizes you for delivering or storing high resolution content. If you ask why a few times, most of the time you’ll discover that others are pushing their own cost structures on to you.

Encoding video is compute-intensive, delivering video is bandwidth intensive, and location matters when ingesting live video. When you use Stream, you don't need to worry about optimizing performance, finding a CDN, and/or tweaking configuration endlessly. Stream takes care of this for you.

Free your live video from the business models of big platforms

Nearly every business uses live video, whether to engage with customers, broadcast events or monetize live content. But few have the specialized engineering resources to deliver live video at scale on their own, and wire together multiple low level cloud services. To date, many of the largest content creators have been forced to depend on a shortlist of social media apps and streaming services to deliver live content at scale.

Unlike the status quo, who force you to put your live video in their apps and services and fit their business models, Stream gives you full control of your live video, on your website or app, on any device, at scale, without pushing your users to someone else’s service.

Free encoding. Free ingestion. Free analytics. Simple per-minute pricing.

	Others	Stream
Encoding	$ per minute	Free
Ingestion	$ per GB	Free
Analytics	Separate product	Free
Live recordings	Minutes or hours later	Instant
Storage	$ per GB	per minute stored
Delivery	$ per GB	per minute delivered

Other platforms charge for ingestion and encoding. Many even force you to consider where you’re streaming to and from, the bitrate and frames per second of your video, and even which of their datacenters you’re using.

With Stream, encoding and ingestion are free. Other platforms charge for delivery based on bandwidth, penalizing you for delivering high quality video to your viewers. If you stream at a high resolution, you pay more.

With Stream, you don’t pay a penalty for delivering high resolution video. Stream’s pricing is simple — minutes of video delivered and stored. Because you pay per minute, not per gigabyte, you can stream at the ideal resolution for your viewers without worrying about bandwidth costs.

Other platforms charge separately for analytics, requiring you to buy another product to get metrics about your live streams.

With Stream, analytics are free. Stream provides an API and Dashboard for both server-side and client-side analytics, that can be queried on a per-video, per-creator, per-country basis, and more. You can use analytics to identify which creators in your app have the most viewed live streams, inform how much to bill your customers for their own usage, identify where content is going viral, and more.

Other platforms tack on live recordings or DVR mode as a separate add-on feature, and recordings only become available minutes or even hours after a live stream ends.

With Stream, live recordings are a built-in feature, made available instantly after a live stream ends. Once a live stream is available, it works just like any other video uploaded to Stream, letting you seamlessly use the same APIs for managing both pre-recorded and live content.

Build live video into your website or app in minutes

Cloudflare Stream enables you or your users to go live using the same protocols and tools that broadcasters big and small use to go live to YouTube or Twitch, but gives you full control over access and presentation of live streams.

Step 1: Create a live input

Create a new live input from the Stream Dashboard or use use the Stream API:

Request

curl -X POST \
-H "Authorization: Bearer " \
-d "{"recording": { "mode": "automatic" } }" \
https://api.cloudflare.com/client/v4/accounts//stream/live_inputs

Response

{
"result": {
"uid": "",
"rtmps": {
"url": "rtmps://live.cloudflare.com:443/live/",
"streamKey": ""
},
...
}
}

Step 2: Use the RTMPS key with any live broadcasting software, or in your own native app

Copy the RTMPS URL and key, and use them with your live streaming application. We recommend using Open Broadcaster Software (OBS) to get started, but any RTMPS or SRT compatible software should be able to interoperate with Stream Live.

Enter the Stream RTMPS URL and the Stream Key from Step 1:

Step 3: Preview your live stream in the Cloudflare Dashboard

In the Stream Dashboard, within seconds of going live, you will see a preview of what your viewers will see, along with the real-time connection status of your live stream.

Step 4: Add live video playback to your website or app

Stream your video using our Stream Player embed code, or use any video player that supports HLS or DASH — live streams can be played in both websites or native iOS and Android apps.

For example, on iOS, all you need to do is provide AVPlayer with the URL to the HLS manifest for your live input, which you can find via the API or in the Stream Dashboard.

import SwiftUI
import AVKit

struct MyView: View {
    // Change the url to the Cloudflare Stream HLS manifest URL
    private let player = AVPlayer(url: URL(string: "https://customer-9cbb9x7nxdw5hb57.cloudflarestream.com/8f92fe7d2c1c0983767649e065e691fc/manifest/video.m3u8")!)

    var body: some View {
        VideoPlayer(player: player)
            .onAppear() {
                player.play()
            }
    }
}

struct MyView_Previews: PreviewProvider {
    static var previews: some View {
        MyView()
    }
}

To run a complete example app in XCode, follow this guide in the Stream Developer docs.

Companies are building whole new video platforms on top of Stream

Developers want control, but most don’t have time to become video experts. And even video experts building innovative new platforms don’t want to manage live streaming infrastructure.

Switcher Studio's whole business is live video -- their iOS app allows creators and businesses to produce their own branded, multi camera live streams. Switcher uses Stream as an essential part of their live streaming infrastructure. In their own words:

“Since 2014, Switcher has helped creators connect to audiences with livestreams. Now, our users create over 100,000 streams per month. As we grew, we needed a scalable content delivery solution. Cloudflare offers secure, fast delivery, and even allowed us to offer new features, like multistreaming. Trusting Cloudflare Stream lets our team focus on the live production tools that make Switcher unique."

While Stream Live has been in beta, we’ve worked with many customers like Switcher, where live video isn’t just one product feature, it is the core of their product. Even as experts in live video, they choose to use Stream, so that they can focus on the unique value they create for their customers, leaving the infrastructure of ingesting, encoding, recording and delivering live video to Cloudflare.

Start building live video into your website or app today

It takes just a few minutes to sign up and start your first live stream, using the Cloudflare Dashboard, with no code required to get started, but APIs for when you’re ready to start building your own live video features. Give it a try — we’re ready for you, no matter the size of your audience.