The Cloudflare Blog

Introducing Markdown for Agents

Celso Martinho — Thu, 12 Feb 2026 14:03:00 GMT

The way content and businesses are discovered online is changing rapidly. In the past, traffic originated from traditional search engines, and SEO determined who got found first. Now the traffic is increasingly coming from AI crawlers and agents that demand structured data within the often-unstructured Web that was built for humans.

As a business, to continue to stay ahead, now is the time to consider not just human visitors, or traditional wisdom for SEO-optimization, but start to treat agents as first-class citizens.

Why markdown is important

Feeding raw HTML to an AI is like paying by the word to read packaging instead of the letter inside. A simple ## About Us on a page in markdown costs roughly 3 tokens; its HTML equivalent –

`About Us`

– burns 12-15, and that's before you account for the

wrappers, nav bars, and script tags that pad every real web page and have zero semantic value.

This blog post you’re reading takes 16,180 tokens in HTML and 3,150 tokens when converted to markdown. That’s a 80% reduction in token usage.

Markdown has quickly become the lingua franca for agents and AI systems as a whole. The format’s explicit structure makes it ideal for AI processing, ultimately resulting in better results while minimizing token waste.

The problem is that the Web is made of HTML, not markdown, and page weight has been steadily increasing over the years, making pages hard to parse. For agents, their goal is to filter out all non-essential elements and scan the relevant content.

The conversion of HTML to markdown is now a common step for any AI pipeline. Still, this process is far from ideal: it wastes computation, adds costs and processing complexity, and above all, it may not be how the content creator intended their content to be used in the first place.

What if AI agents could bypass the complexities of intent analysis and document conversion, and instead receive structured markdown directly from the source?

Convert HTML to markdown, automatically

Cloudflare's network now supports real-time content conversion at the source, for enabled zones using content negotiation headers. Now when AI systems request pages from any website that uses Cloudflare and has Markdown for Agents enabled, they can express the preference for text/markdown in the request. Our network will automatically and efficiently convert the HTML to markdown, when possible, on the fly.

Here’s how it works. To fetch the markdown version of any page from a zone with Markdown for Agents enabled, the client needs to add the Accept negotiation header with text/markdown as one of the options. Cloudflare will detect this, fetch the original HTML version from the origin, and convert it to markdown before serving it to the client.

Here's a curl example with the Accept negotiation header requesting a page from our developer documentation:

curl https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/ \
  -H "Accept: text/markdown"

Or if you’re building an AI Agent using Workers, you can use TypeScript:

const r = await fetch(
  `https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/`,
  {
    headers: {
      Accept: "text/markdown, text/html",
    },
  },
);
const tokenCount = r.headers.get("x-markdown-tokens");
const markdown = await r.text();

We already see some of the most popular coding agents today – like Claude Code and OpenCode – send these accept headers with their requests for content. Now, the response to this request is formatted in markdown. It's that simple.

HTTP/2 200
date: Wed, 11 Feb 2026 11:44:48 GMT
content-type: text/markdown; charset=utf-8
content-length: 2899
vary: accept
x-markdown-tokens: 725
content-signal: ai-train=yes, search=yes, ai-input=yes

---
title: Markdown for Agents · Cloudflare Agents docs
---

## What is Markdown for Agents

The ability to parse and convert HTML to Markdown has become foundational for AI.
...

Note that we include an x-markdown-tokens header with the converted response that indicates the estimated number of tokens in the markdown document. You can use this value in your flow, for example to calculate the size of a context window or to decide on your chunking strategy.

Here’s a diagram of how it works:

Content Signals Policy

During our last Birthday Week, Cloudflare announced Content Signals — a framework that allows anyone to express their preferences for how their content can be used after it has been accessed.

When you return markdown, you want to make sure your content is being used by the Agent or AI crawler. That’s why Markdown for Agents converted responses include the Content-Signal: ai-train=yes, search=yes, ai-input=yes header signaling that indicates content can be used for AI Training, Search results and AI Input, which includes agentic use. Markdown for Agents will provide options to define custom Content Signal policies in the future.

Check our dedicated Content Signals page for more information on this framework.

Try it with the Cloudflare Blog & Developer Documentation

We enabled this feature in our Developer Documentation and our Blog, inviting all AI crawlers and agents to consume our content using markdown instead of HTML.

Try it out now by requesting this blog with Accept: text/markdown.

curl https://blog.cloudflare.com/markdown-for-agents/ \
  -H "Accept: text/markdown"

The result is:

---
description: The way content is discovered online is shifting, from traditional search engines to AI agents that need structured data from a Web built for humans. It’s time to consider not just human visitors, but start to treat agents as first-class citizens. Markdown for Agents automatically converts any HTML page requested from our network to markdown.
title: Introducing Markdown for Agents
image: https://blog.cloudflare.com/images/markdown-for-agents.png
---

# Introducing Markdown for Agents

The way content and businesses are discovered online is changing rapidly. In the past, traffic originated from traditional search engines and SEO determined who got found first. Now the traffic is increasingly coming from AI crawlers and agents that demand structured data within the often-unstructured Web that was built for humans.

...

Other ways to convert to Markdown

If you’re building AI systems that require arbitrary document conversion from outside Cloudflare or Markdown for Agents is not available from the content source, we provide other ways to convert documents to Markdown for your applications:

Workers AI AI.toMarkdown() supports multiple document types, not just HTML, and summarization.
Browser Rendering /markdown REST API supports markdown conversion if you need to render a dynamic page or application in a real browser before converting it.

Tracking markdown usage

Anticipating a shift in how AI systems browse the Web, Cloudflare Radar now includes content type insights for AI bot and crawler traffic, both globally on the AI Insights page and in the individual bot information pages.

The new content_type dimension and filter shows the distribution of content types returned to AI agents and crawlers, grouped by MIME type category.

You can also see the requests for markdown filtered by a specific agent or crawler. Here are the requests that return markdown to OAI-Searchbot, the crawler used by OpenAI to power ChatGPT’s search:

This new data will allow us to track the evolution of how AI bots, crawlers, and agents are consuming Web content over time. As always, everything on Radar is freely accessible via the public APIs and the Data Explorer.

Start using today

To enable Markdown for Agents for your zone, log into the Cloudflare dashboard, select your account, select the zone, look for Quick Actions and toggle the Markdown for Agents button to enable. This feature is available today in Beta at no cost for Pro, Business and Enterprise plans, as well as SSL for SaaS customers.

You can find more information about Markdown for Agents on our Developer Docs. We welcome your feedback as we continue to refine and enhance this feature. We’re curious to see how AI crawlers and agents navigate and adapt to the unstructured nature of the Web as it evolves.

Human Native is joining Cloudflare

Will Allen — Thu, 15 Jan 2026 14:00:00 GMT

Today, we’re excited to share that Cloudflare has acquired Human Native, a UK-based AI data marketplace specializing in transforming multimedia content into searchable and useful data.

Human Native x Cloudflare

The Human Native team has spent the past few years focused on helping AI developers create better AI through licensed data. Their technology helps publishers and developers turn messy, unstructured content into something that can be understood, licensed and ultimately valued. They have approached data not as something to be scraped, but as an asset class that deserves structure, transparency and respect.

Access to high-quality data can lead to better technical performance. One of Human Native’s customers, a prominent UK video AI company, threw away their existing training data after achieving superior results with data sourced through Human Native. Going forward they are only training on fully licensed, reputably sourced, high-quality content.

This gives a preview of what the economic model of the Internet can be in the age of generative AI: better AI built on better data, with fair control, compensation and credit for creators.

The Internet needs new economic models

For the last 30 years, the open Internet has been based on a fundamental value exchange: creators create content, aggregators (such as search engines or social media) send traffic. Creators can monetize that traffic through advertisements, subscriptions or direct support. This is the economic loop that has powered the explosive growth of the Internet.

But it’s under real strain.

Crawl-to-referral ratios are skyrocketing, with 10s of thousands of AI and bot crawls per real human visitor, and it’s unclear how multipurpose crawlers are using the content they access.

The community of creators who publish on the Internet is a diverse group: news publishers, content creators, financial professionals, technology companies, aggregators and more. But they have one thing in common: They want to decide how their content is used by AI systems.

Cloudflare’s work in building AI Crawl Control and Pay Per Crawl is predicated on a simple philosophy: Content owners should get to decide how and when their content is accessed by others. Many of our customers want to optimize their brand and content to make sure it is in every training data set and shows up in every new search; others want to have more control and only allow access if there is direct compensation.

Our tools like AI Search, AI Crawl Control and Pay Per Crawl can help, wherever you land in that equation. The important thing is that the content owner gets to decide.

New tools for AI developers

With the Human Native team joining Cloudflare, we are accelerating our work in helping customers transform their content to be easily accessed and understood by AI bots and agents in addition to their traditional human audiences.

Crawling is complex, expensive in terms of engineering and compute to process the content, and has no guarantees of quality control. A crawled index can contain duplicates, spam, illegal material and many more headaches. Developers are left with messy, unstructured data.

We recently announced our work in building the AI Index, a powerful new way for both foundation model companies and agents to access content at scale.

Instead of sending crawlers blindly and repeatedly across the open Internet, AI developers will be able to connect via a pub/sub model: participating websites will expose structured updates whenever their content changes, and developers will be able to subscribe to receive those updates in real time.

This opens up new avenues for content creators to experiment with new business models.

Building the foundation for these new business models

Cloudflare is investing heavily in creating the foundations for these new business models, starting with x402.

We recently announced that we are creating the x402 Foundation, in partnership with Coinbase, to enable machine-to-machine transactions for digital resources.

Payments on the web have historically been designed for humans. We browse a merchant’s website, show intent by adding items to a cart, and confirm our intent to purchase by putting in our credit card information and clicking “Pay.” But what if you want to enable direct transactions between automated systems? We need protocols to allow machine-to-machine transactions.

Together, Human Native and Cloudflare will accelerate our work in building the basis of these new economic models for the Internet.

What’s next

The Internet works best when it is open, fair, and independently sustainable. We’re excited to welcome the Human Native team to Cloudflare, and even more excited about what we will build together to improve the foundations of the Internet in the age of AI.

Onwards.

Securing agentic commerce: helping AI Agents transact with Visa and Mastercard

Rohin Lohe — Fri, 24 Oct 2025 13:00:00 GMT

The era of agentic commerce is coming, and it brings with it significant new challenges for security. That’s why Cloudflare is partnering with Visa and Mastercard to help secure automated commerce as AI agents search, compare, and purchase on behalf of consumers.

Through our collaboration, Visa developed the Trusted Agent Protocol and Mastercard developed Agent Pay to help merchants distinguish legitimate, approved agents from malicious bots. Both Trusted Agent Protocol and Agent Pay leverage Web Bot Auth as the agent authentication layer to allow networks like Cloudflare to verify traffic from AI shopping agents that register with a payment network.

The challenges with agentic commerce

Agentic commerce is commerce driven by AI agents. As AI agents execute more transactions, merchants need to protect themselves and maintain trust with their customers. Merchants are beginning to see the promise of agentic commerce but face significant challenges:

How can they distinguish a helpful, approved AI shopping agent from a malicious bot or web crawler?
Is the agent representing a known, repeat customer or someone entirely new?
Are there particular instructions the consumer gave to their agent that the merchant should respect?

We are working with Visa and Mastercard, two of the most trusted consumer brands in payments, to address each of these challenges.

Web Bot Auth is the foundation to securing agentic commerce

In May, we shared a new proposal called Web Bot Auth to cryptographically authenticate agent traffic. Historically, agent traffic has been classified using the user agent and IP address. However, these fields can be spoofed, leading to inaccurate classifications and bot mitigations can be applied inaccurately. Web Bot Auth allows an agent to provide a stable identifier by using HTTP Message Signatures with public key cryptography.

As we spent time collaborating with the teams at Visa and Mastercard, we found that we could leverage Web Bot Auth as the foundation to ensure that each commerce agent request was verifiable, time-based, and non-replayable.

Visa’s Trusted Agent Protocol and Mastercard’s Agent Pay present three key solutions for merchants to manage agentic commerce transactions. First, merchants can identify a registered agent and distinguish whether a particular interaction is intended to browse or to pay. Second, merchants can link an agent to a consumer identity. Last, merchants can indicate to agents how a payment is expected, whether that is through a network token, browser-use guest checkout, or a micropayment.

This allows merchants that integrate with these protocols to instantly recognize a trusted agent during two key interactions: the initial browsing phase to determine product details and final costs, and the final payment interaction to complete a purchase. Ultimately, this provides merchants with the tools to verify these signatures, identify trusted interactions, and securely manage how these agents can interact with their site.

How it works: leveraging HTTP message signatures

To make this work, an ecosystem of participants need to be on the same page. It all starts with agent developers, who build the agents to shop on behalf of consumers. These agents then interact with merchants, who need a reliable way to assess the request is made on behalf of consumers. Merchants rely on networks like Cloudflare to verify the agent's cryptographic signatures and ensure the interaction is legitimate. Finally, there are payment networks like Visa and Mastercard, who can link cardholder identity to agentic commerce transactions, helping ensure that transactions are verifiable and accountable.

When developing their protocols, Visa and Mastercard needed a secure way to authenticate each agent developer and securely transmit information from the agent to the merchant’s website. That’s where we came in and worked with their teams to build upon Web Bot Auth. Web Bot Auth proposals specify how developers of bots and agents can attach their cryptographic signatures in HTTP requests by using HTTP Message Signatures.

Both Visa and Mastercard protocols require agents to register and have their public keys (referenced as the keyid in the Signature-Input header) in a well-known directory, allowing merchants and networks to fetch the keys to validate these HTTP message signatures. To start, Visa and Mastercard will be hosting their own directories for Visa-registered and Mastercard-registered agents, respectively

The newly created agents then communicate their registration, identity, and payment details with the merchant using these HTTP Message Signatures. Both protocols build on Web Bot Auth by introducing a new tag that agents must supply in the Signature-Input header, which indicates whether the agent is browsing or purchasing. Merchants can use the tag to determine whether to interact with the agent. Agents must also include the nonce field, a unique sequence included in the signature, to provide protection against replay attacks.

An agent visiting a merchant’s website to browse a catalog would include an HTTP Message Signature in their request to verify their agent is authorized to browse the merchant’s storefront on behalf of a specific Visa cardholder:

GET /path/to/resource HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 Chrome/113.0.0 MyShoppingAgent/1.1
Signature-Input: 
  sig2=("@authority" "@path"); 
  created=1735689600; 
  expires=1735693200; 
  keyid="poqkLGiymh_W0uP6PZFw-dvez3QJT5SolqXBCW38r0U"; 
  alg="Ed25519";   nonce="e8N7S2MFd/qrd6T2R3tdfAuuANngKI7LFtKYI/vowzk4IAZyadIX6wW25MwG7DCT9RUKAJ0qVkU0mEeLEIW1qg=="; 
  tag="web-bot-auth"
Signature: sig2=:jdq0SqOwHdyHr9+r5jw3iYZH6aNGKijYp/EstF4RQTQdi5N5YYKrD+mCT1HA1nZDsi6nJKuHxUi/5Syp3rLWBA==:

Trusted Agent Protocol and Agent Pay are designed for merchants to benefit from its validation mechanisms without changing their infrastructure. Instead, merchants can set the rules for agent interactions on their site and rely upon Cloudflare as the validator. For these requests, Cloudflare will run the following checks:

Confirm the presence of the Signature-Input and Signature headers.
Pull the keyid from the Signature-Input. If Cloudflare has not previously retrieved and cached the key, fetch it from the public key directory.
Confirm the current time falls between the created and expires timestamps.
Check nonce uniqueness in the cache. By checking if a nonce has been recently used, Cloudflare can reject reused or expired signatures, ensuring the request is not a malicious copy of a prior, legitimate interaction.
Check the validity of the tag, as defined by the protocol. If the agent is browsing, the tag should be agent-browser-auth. If the agent is paying, the tag should be agent-payer-auth.
Reconstruct the canonical signature base using the components from the Signature-Input header.
Perform the cryptographic ed25519 signature verification using the key supplied in keyid.

Here is an example from Visa on the flow for agent validation:

Mastercard’s Agent Pay validation flow is outlined below:

What’s next: Cloudflare’s Agent SDK & Managed Rules

We recently introduced support for x402 transactions into Cloudflare’s Agent SDK, allowing anyone building an agent to easily transact using the new x402 protocol. We will similarly be working with Visa and Mastercard over the coming months to bring support for their protocols directly to the Agents SDK. This will allow developers to manage their registered agent’s private keys and to easily create the correct HTTP message signatures to authorize their agent to browse and transact on a merchant website.

Conceptually, the requests in a Cloudflare Worker would look something like this:

/**
 * Pseudocode example of a Cloudflare Worker acting as a trusted agent.
 * This version explicitly illustrates the signing logic to show the core flow. 
 */


// Helper function to encapsulate the signing protocol logic.
async function createSignatureHeaders(targetUrl, credentials) {
    // Internally, this function would perform the detailed cryptographic steps:
    // 1. Generate timestamps and a unique nonce.
    // 2. Construct the 'Signature-Input' header string with all required parameters.
    // 3. Build the canonical 'Signature Base' string according to the spec.
    // 4. Use the private key to sign the base string.
    // 5. Return the fully formed 'Signature-Input' and 'Signature' headers.
    
    const signedHeaders = new Headers();
    
    signedHeaders.set('Signature-Input', 'sig2=(...); keyid="..."; ...');
    signedHeaders.set('Signature', 'sig2=:...');
    return signedHeaders;
}


export default {
    async fetch(request, env) {
        // 1. Load the final API endpoint and private signing credentials.
        const targetUrl = new URL(request.url).searchParams.get('target');
        const credentials = { 
            privateKey: env.PAYMENT_NETWORK_PRIVATE_KEY, 
            keyId: env.PAYMENT_NETWORK_KEY_ID 
        };


        // 2. Generate the required signature headers using the helper.
        const signatureHeaders = await createSignatureHeaders(targetUrl, credentials);


        // 3. Attach the newly created signature headers to the request for authentication.
        const signedRequestHeaders = new Headers(request.headers);
        signedRequestHeaders.set('Host', new URL(targetUrl).hostname);
        signedRequestHeaders.set('Signature-Input', signatureHeaders.get('Signature-Input'));
        signedRequestHeaders.set('Signature', signatureHeaders.get('Signature'));


       // 4. Forward the fully signed request to the protected API.
        return fetch(targetUrl, { headers: signedRequestHeaders });
    },
};

We’ll also be creating new managed rulesets for our customers that make it easy to allow agents that are using the Trusted Agent Protocol or Agent Pay. You might want to disallow most automated traffic to your storefront but not miss out on revenue opportunities from agents authorized to make a purchase on behalf of a cardholder. A managed rule would make this straightforward to implement. As the website owner, you could enable a managed rule that automatically allows all trusted agents registered with Visa or Mastercard to come to your site, passing your other bot protection & WAF rules.

These will continue to evolve, and we will incorporate feedback to ensure that agent registration and validation works seamlessly across all networks and aligns with the Web Bot Auth proposal. American Express will also be leveraging Web Bot Auth as the foundation to their agentic commerce offering.

How to get started today

You can start building with Cloudflare’s Agent SDK today, see a sample implementation of the Trusted Agent Protocol, and view the Trusted Agent Protocol and Agent Pay docs.

We look forward to your contribution and feedback, should this be engaging on GitHub, building apps, or engaging in mailing lists discussions.

Giving users choice with Cloudflare’s new Content Signals Policy

Will Allen — Wed, 24 Sep 2025 13:10:00 GMT

If we want to keep the web open and thriving, we need more tools to express how content creators want their data to be used while allowing open access. Today the tradeoff is too limited. Either website operators keep their content open to the web and risk people using it for unwanted purposes, or they move their content behind logins and limit their audience.

To address the concerns our customers have today about how their content is being used by crawlers and data scrapers, we are launching the Content Signals Policy. This policy is a new addition to robots.txt that allows you to express your preferences for how your content can be used after it has been accessed.

What `robots.txt` does, and does not, do today

Robots.txt is a plain text file hosted on your domain that implements the Robots Exclusion Protocol. It allows you to instruct which crawlers and bots can access which parts of your site. Many crawlers and some bots obey robots.txt files, but not all do.

For example, if you wanted to allow all crawlers to access every part of your site, you could host a robots.txt file that has the following:

User-agent: * 
Allow: /

A user-agent is how your browser, or a bot, identifies themselves to the resource they are accessing. In this case, the asterisk tells visitors that any user agent, on any device or browser, can access the content. The / in the Allow field tells the visitor that they can access any part of the site as well.

The robots.txt file can also include commentary by adding characters after # symbol. Bots and machines will ignore these comments, but it is one way to leave more human-readable notes to someone reviewing the file. Here is one example:

#    .__________________________.
#    | .___________________. |==|
#    | | ................. | |  |
#    | | ::[ Dear robot ]: | |  |
#    | | ::::[ be nice ]:: | |  |
#    | | ::::::::::::::::: | |  |
#    | | ::::::::::::::::: | |  |
#    | | ::::::::::::::::: | |  |
#    | | ::::::::::::::::: | | ,|
#    | !___________________! |(c|
#    !_______________________!__!
#   /                            \
#  /  [][][][][][][][][][][][][]  \
# /  [][][][][][][][][][][][][][]  \
#(  [][][][][____________][][][][]  )
# \ ------------------------------ /
#  \______________________________/

Website owners can make robots.txt more specific by listing certain user-agents (such as for only permitting certain bot user-agents or browser user-agents) and by stating which parts of a site they are or are not allowed to crawl. The example below tells bots to skip crawling the archives path.

User-agent: * 
Disallow: /archives/

And the example here gets more specific, telling Google’s bot to skip crawling the archives path.

User-agent: Googlebot 
Disallow: /archives/

This allows you to specify which crawlers are allowed and what parts of your site they can access. It does not, however, let them know what they are able to do with your content after accessing it. As many have realized, there needs to be a standard, machine-readable way to signal the rules of your road for how your data can be used even after it has been accessed.

That is what the Content Signals Policy allows you to express: your preferences for what a crawler can, and cannot do with your content.

Why are we launching the Content Signals Policy now?

There are companies that scrape vast troves of data from the Internet every day. There is a real cost to website operators to serve these data scrapers, in particular when they receive no compensation in return; we are experiencing a classic free-rider problem. This is only going to get worse: we expect bot traffic to exceed human traffic on the Internet by the end of 2029, and by 2031, we anticipate that bot activity alone will surpass the sum of current Internet traffic.

The de facto defaults of the Internet permitted this. The norm had been that your data would be ingested, but then you, the creator of that content, would get something in return: either referral traffic that you could monetize, or at a minimum some sort of attribution that cited you as the author. Think of the linkback in the early days of blogging, which was a way to give credit to the original creator of the work. No money changed hands, but that attribution drove future discovery and had intrinsic value. This norm has been embedded in many permissive licenses such as MIT and Creative Commons, each of which require attribution back to the original creator.

That world has changed; that scraped content is now sometimes used to economically compete against the original creator. It’s left many with an impossible choice: do you lock down access to your content and data, or accept the reality of fewer referrals and minimal attribution? If the only recourse is the former, the open transmission of ideas on the web is harmed and newer entrants to the AI ecosystem are put at an unfair disadvantage for their efforts to train new models.

The Cloudflare Content Signals Policy

The Content Signals Policy integrates into website operators’ robots.txt files. It is human-readable text following the # symbol to designate it as a comment. This policy defines three content signals - search, ai-input, and ai-train - and their relevance to crawlers.

A website operator can then optionally express their preferences via machine-readable content signals.

# As a condition of accessing this website, you agree to abide by the following content signals:

# (a)  If a content-signal = yes, you may collect content for the corresponding use.
# (b)  If a content-signal = no, you may not collect content for the corresponding use.
# (c)  If the website operator does not include a content signal for a corresponding use, the website operator neither grants nor restricts permission via content signal with respect to the corresponding use.

# The content signals and their meanings are: 

# search: building a search index and providing search results (e.g., returning hyperlinks and short excerpts from your website's contents).  Search does not include providing AI-generated search summaries.
# ai-input: inputting content into one or more AI models (e.g., retrieval augmented generation, grounding, or other real-time taking of content for generative AI search answers). 
# ai-train: training or fine-tuning AI models.

# ANY RESTRICTIONS EXPRESSED VIA CONTENT SIGNALS ARE EXPRESS RESERVATIONS OF RIGHTS UNDER ARTICLE 4 OF THE EUROPEAN UNION DIRECTIVE 2019/790 ON COPYRIGHT AND RELATED RIGHTS IN THE DIGITAL SINGLE MARKET.

There are three parts to this text:

The first paragraph explains to companies how to interpret any given content signal. “Yes” means go, “no” means stop, and the absence of a signal conveys no meaning. That final, neutral option is important: it lets website operators express a preference with respect to one content signal without requiring them to do so for another.
The second paragraph defines the content signals vocabulary. We kept the signals simple to make it easy for anyone accessing content to abide by them.
The final paragraph reminds those automating access to data that these content signals might have legal rights in various jurisdictions.

A website operator can then announce their specific preferences in machine-readable text using comma-delimited, ‘yes’ or ‘no’ syntax. If a website operator wants to allow search, disallow training, and expressed no preference regarding ai-input, they could include the following in their robots.txt:

User-Agent: *
Content-Signal: search=yes, ai-train=no 
Allow: /

If a website operator leaves the content signal for ai-input blank like in the above example, it does not mean they have no preference regarding that use; it just means they have not used this part of their robots.txt file to express it.

How to add content signals to your website

If you already know how to configure your robots.txt file, deploying content signals is as simple as adding the Content Signals Policy above and then defining your preferences via a content signal.

We want to make adopting content signals simple. Cloudflare customers have already turned on our managed robots.txt feature for over 3.8 million domains. By doing so, they have chosen to instruct companies that they do not want the content on those domains to be used for AI training. For these customers, we will update the robots.txt file that we already serve on their behalf to include the Content Signals Policy and the following signals:

Content-Signal: search=yes, ai-train=no

We will not serve an “ai-input” signal for our managed robots.txt customers. We don’t know their preference with respect to that signal, and we don’t want to guess.

Starting today, we also will serve the commented, human-readable Content Signals Policy for any free customer zone that does not have an existing robots.txt file. In practice, that means a request to robots.txt on that domain would return the comments that define what content signals are. These comments are ignored by crawlers. Importantly, it will not include any Allow or Disallow directives, nor will not serve any actual content signals. The users are the ones to choose and express their actual preferences if and when they are ready to do so. Customers with an existing robots.txt file will see no change.

Zones on a free plan can turn off the Content Signals Policy in the Security Settings section of the Cloudflare dashboard, as well as via the Overview section.

To create your own content signals, just copy and paste the text that we help you generate at ContentSignals.org into your robots.txt file, or immediately deploy via the Deploy to Cloudflare button. You can alternatively turn on our managed robots.txt feature if you would like to express your preference to disallow training.

It’s important to remember that content signals express preferences; they are not technical countermeasures against scraping. Some companies might simply ignore them. If you are a website publisher seeking to control what others do with your content, we think it is best to combine your content signals with WAF rules and Bot Management.

While these Cloudflare features aim to make it easier to use, we want to encourage adoption by anyone, anywhere. In order to promote this practice, we are releasing this policy under a CC0 License, which allows anyone to implement and use it freely.

What’s next

Our customers are fully in the driver’s seat for what crawlers they want to allow and what they’d like to block. Some want to write for the superintelligence, others want more control: we think they should be the ones to decide.

Content signals allow anyone to express how they want their content to be used after it has been accessed. Enabling the ability to express preferences was overdue.

We know there’s more work to do. Signaling the rules of the road only works if others recognize those rules. That’s why we’ll continue to work in standards bodies to develop and standardize solutions that meet the needs of our customers and are accepted by the broader Internet community.

We hope you’ll join us in these efforts: the open web is worth fighting for.

Launching the x402 Foundation with Coinbase, and support for x402 transactions

Will Allen — Tue, 23 Sep 2025 13:00:00 GMT

Cloudflare is partnering with Coinbase to create the x402 Foundation. This foundation’s mission will be to encourage the adoption of the x402 protocol, an updated framework that allows clients and services to exchange value on the web using a common language. In addition to today’s partnership, we are shipping a set of features to allow developers to use x402 in the Agents SDK and our MCP integrations, as well as proposing a new deferred payment scheme.

Payments in the age of agents

Payments on the web have historically been designed for humans. We browse a merchant’s website, show intent by adding items to a cart, and confirm our intent to purchase by inputting our credit card information and clicking “Pay.” But what if you want to enable direct transactions between digital services? We need protocols to allow machine-to-machine transactions.

Every day, sites on Cloudflare send out over a billion HTTP 402 response codes to bots and crawlers trying to access their content and e-commerce stores. This response code comes with a simple message: “Payment Required.”

Yet these 402 responses too often go unheard. One reason is a lack of standardization. Without a specification for how to format and respond to those response codes, content creators, publishers, and website operators lack adequate tools to convey their payment requests. x402 can give developers a clear, open protocol for websites and automated agents to negotiate payments across the globe.

A Primer on x402

Coinbase authored the x402 transaction flow, outlined below, to help machines pay directly for resources over HTTP:

A client attempts to access a resource gated by x402.
The server responds with the status code 402 Payment Required. The response body contains payment instructions including the payment amount and recipient.
The client requests the x402-gated resource with the payment authorization header.
The payment facilitator verifies the client’s payment payload and settles the transaction.
The server responds with the requested resource in the response, along with the payment response header that confirms the payment outcome.

This flow creates programmatic access to resources across the Internet. Clients and servers capable of interpreting the x402 protocol are able to transact without the need for accounts, subscriptions, or API keys.

x402 can be used to monetize traditional use cases, but also enables monetization of a new class of use cases. For example:

An assistant that is able to purchase accessories for your Halloween costume from multiple merchants.
An AI agent that pays per browser rendering session, instead of committing to a monthly subscription fee.
An autonomous stock trader that makes micropayments for a high quality real-time data feed to drive decisions.

Future versions of x402 could be agnostic of the payment rails, accommodating credit cards and bank accounts in addition to stablecoins.

Cloudflare’s pay per crawl: proposing the x402 deferred payment scheme

Agents and crawlers often require two important functions that already exist in much of today's financial infrastructure: delayed settlement to account for disputes; and a single, aggregated payment to make their accounting simpler. For example, crawlers participating in our private beta of pay per crawl are able to crawl a vast number of pages easily, generate audit logs, and then be charged a single fee via a connected credit card or bank account at the end of each day.

To account for these types of payment scenarios, we're proposing a new deferred payment scheme for the x402 protocol. This new scheme is specifically designed for agentic payments that don't need immediate settlement and can be handled either through traditional payment methods or stablecoins. By proposing this addition, we're helping to ensure that any compliant server can optionally decouple the cryptographic handshake from the payment settlement itself, giving agents and servers the ability to use pre-negotiated licensing agreements, batch settlements, or subscriptions.

We will be bringing this new deferred payment scheme to pay per crawl as we expand and evolve the private beta.

The Handshake Explained

Here’s our initial proposal for the handshake that could be released in the next major version of x402:

1. The Server’s Offer

Today, an unauthenticated or unauthorized client attempts to access a resource and receives a 402 Payment Required response. The server provides a payment commitment payload that the client can use to construct a re-request. This response is a machine-readable offer, and our proposal includes a new scheme of deferred.

HTTP/1.1 402 Payment Required
Content-Type: application/json

{
  "accepts": [
    {
      "scheme": "deferred",
      "network": "example-network-provider",
      "resource": "https://example.com/page",
      "...": "...",
      "extras": {
        "id": "abc123",
        "termsUrl": "https://example.com/terms"
      },
    }
  ]
}

2. The Client's Signed Commitment

Next, the client re-sends the request with a signed payload containing their payment commitment. The deferred scheme uses HTTP Message Signatures where a JWK-formatted public key is available in a hosted directory. The Signature-Input header clearly explains which parts of the request are included in the Signature to serve as cryptographic proof of the client's intent, verifiable by the service provider without an on-chain transaction.

GET /path/to/resource HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 Chrome/113.0.0 MyBotCrawler/1.1
Payment:
    scheme="deferred",
    network="example-network-provider",
    id="abc123"
Signature-Agent: signer.example.com
Signature-Input:
    sig=("payment" "signature-agent");
    created=1700000000;
    expires=1700011111;
    keyid="ba3e64==";
    tag="web-bot-auth"
Signature: sig=abc==

3. Successful Response

The resource server validates the signature and returns the content with a confirmation header. The server is responsible for attributing the payment to the account associated with the HTTP message signature, verifying the client's identity and then delivering the content. In this scenario, there is no blockchain associated with the payments.

HTTP/1.1 200 OK
Content-Type: text/html
Payment-Response:
    scheme="deferred",
    network="example-network-provider",
    id="abc123",
    timestamp=1730872968

4. Payment Settlement

The server can now handle the settlement flexibly. The validated id from the handshake acts as a reference for the transaction. This approach enables a flexible use model without per-request overhead, allowing the server to roll up payments on a subscription, daily, or even batch basis. This creates a flexible framework where the cryptographic trust is established immediately, while the financial settlement can use traditional payment rails or stablecoins.

Cloudflare’s MCP servers, Agents SDK, and x402 payments

Running code is what moves an open convention from the theoretical to truly useful, and eventually to a recognized standard. Agents built using Cloudflare’s Agent SDK can now pay for resources with x402, and MCP servers can expose tools to be paid for via x402. To show how this works, we created the x402 playground, a live demo employing x402. The x402 playground is powered by the Agents SDK and has access to tools from MCP servers deployed on Cloudflare.

When you open the x402 playground, a new wallet is created and funded with Testnet USDC on a Base blockchain testnet. The agent, built with Agents SDK, has access to an MCP server with both free and paid tools.

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { McpAgent } from "agents/mcp";
import { withX402 } from "agents/x402";

export class PayMCP extends McpAgent {
  server = withX402(
    new McpServer({ name: "PayMCP", version: "1.0.0" }),
    X402_CONFIG
  );

  async init() {
    // Paid tool
    this.server.paidTool(
      "square",
      "Squares a number",
      0.01, // Tool price
      {
        a: z.number()
      },
      {},
      async ({ number }) => {
        return { content: [{ type: "text", text: String(a ** 2) }] };
      }
    );

    // Free tool
    this.server.tool(
      "add-two-numbers",
      "Adds two numbers",
      {
        a: z.number(),
        b: z.number(),
      },
      async ({ a, b }) => {
        return { content: [{ type: 'text', text: String(a + b) }] };
      }
    );
  }
}

When the agent attempts to use a paid tool, the MCP server responds with a 402 Payment Required. The agent is able to interpret the payment instructions and prompt the human whether they want to proceed with the transaction. Building an x402-compatible client requires a basic wrapper on the tool call:

import { Agent } from "agents";
import { withX402Client } from "agents/x402";

export class MyAgent extends Agent {
  // Your Agent definitions...

  async onToolCall() {

    // Build the x402 client
    const x402Client = withX402Client(
      myMcpClient,
      { network: "base-sepolia", account: this.account }
    );

    // The first parameter becomes the confirmation callback.
    // We can set it to `null` if we want the agent to pay automatically.
    const res = await x402Client.callTool(
      this.onPaymentRequired,
      {
        name: toolName,
        arguments: toolArgs
    });
  }
}

This test agent draws down the funds from the wallet and sends the payment payload to the MCP server, which settles the transaction. The transactions can be specified to execute with or without human confirmation, allowing you to design the interface best suited for your application.

What’s next?

You can get started today by using the Agents SDK or by deploying your own MCP server.

We’ll continue to work closely with Coinbase to establish the x402 Foundation. Stay tuned for more announcements on the specifics of the structure very soon.

We believe in the value of open and interoperable protocols – which is why we are encouraging everyone to contribute to the x402 protocol directly. To get in touch with the team at Cloudflare working on x402, email us at x402@cloudflare.com.

The next step for content creators in working with AI bots: Introducing AI Crawl Control

Will Allen — Thu, 28 Aug 2025 14:00:00 GMT

Empowering content creators in the age of AI with smarter crawling controls and direct communication channels

Imagine you run a regional news site. Last month an AI bot scraped 3 years of archives in minutes — with no payment and little to no referral traffic. As a small company, you may struggle to get the AI company's attention for a licensing deal. Do you block all crawler traffic, or do you let them in and settle for the few referrals they send?

It’s picking between two bad options.

Cloudflare wants to help break that stalemate. On July 1st of this year, we declared Content Independence Day based on a simple premise: creators deserve control of how their content is accessed and used. Today, we're taking the next step in that journey by releasing AI Crawl Control to general availability — giving content creators and AI crawlers an important new way to communicate.

AI Crawl Control goes GA

Today, we're rebranding our AI Audit tool as AI Crawl Control and moving it from beta to general availability. This reflects the tool's evolution from simple monitoring to detailed insights and control over how AI systems can access your content.

The market response has been overwhelming: content creators across industries needed real agency, not just visibility. AI Crawl Control delivers that control.

Using HTTP 402 to help publishers license content to AI crawlers

Many content creators have faced a binary choice: either they block all AI crawlers and miss potential licensing opportunities and referral traffic; or allow them through without any compensation. Many content creators had no practical way to say "we're open for business, but let's talk terms first."

Our customers are telling us:

We want to license our content, but crawlers don't know how to reach us.
Blanket blocking feels like we're closing doors on potential revenue and referral traffic.
We need a way to communicate our terms before crawling begins.

To address these needs, we are making it easier than ever to send customizable 402 HTTP status codes.

Our private beta launch of Pay Per Crawl put the HTTP 402 (“Payment Required”) response codes to use, working in tandem with Web Bot Auth to enable direct payments between agents and content creators. Today, we’re making customizable 402 response codes available to every paid Cloudflare customer — not just pay per crawl users.

Here's how it works: in AI Crawl Control, paying Cloudflare customers will be able to select individual bots to block with a configurable message parameter and send 402 payment required responses. Think: "To access this content, email partnerships@yoursite.com or call 1-800-LICENSE" or "Premium content available via API at api.yoursite.com/pricing."

On an average day, Cloudflare customers are already sending over one billion 402 response codes. This shows a deep desire to move beyond blocking to open communication channels and new monetization models. With the 402 HTTP status code, content creators can tell crawlers exactly how to properly license their content, creating a direct path from crawling to a commercial agreement. We are excited to make this easier than ever in the AI Crawl Control dashboard.

How to customize your 402 status code with AI Crawl Control:

For Paid Plan Users:

When you block individual crawlers from the AI Crawl Control dashboard, you can now choose to send 402 Payment Required status codes and customize your message. For example: To access this content, email partnerships@yoursite.com or call 1-800-LICENSE.

The response will look like this:

The message can be configured from Settings in the AI Crawl Control Dashboard:

Beyond just blocking AI bots

This is just the beginning. We're planning to add additional parameters that will let crawlers understand the content's value, freshness, and licensing terms directly in the 402 response. Imagine crawlers receiving structured data about content quality and update frequency, for example, in addition to contact information.

Meanwhile, pay per crawl continues advancing through beta, giving content creators the infrastructure to automatically monetize crawler access with transparent, usage-based pricing.

What excites us most is the market shift we're seeing. We're moving to a world where content creators have clear monetization paths to become active participants in the development of rich AI experiences.

The 402 response is a bridge between two industries that want to work together: content creators whose work fuels AI development, and AI companies who need high-quality data. Cloudflare’s AI Crawl Control creates the infrastructure for these partnerships to flourish.

Introducing pay per crawl: Enabling content owners to charge AI crawlers for access

Will Allen — Tue, 01 Jul 2025 10:00:00 GMT

A changing landscape of consumption

Many publishers, content creators and website owners currently feel like they have a binary choice — either leave the front door wide open for AI to consume everything they create, or create their own walled garden. But what if there was another way?

At Cloudflare, we started from a simple principle: we wanted content creators to have control over who accesses their work. If a creator wants to block all AI crawlers from their content, they should be able to do so. If a creator wants to allow some or all AI crawlers full access to their content for free, they should be able to do that, too. Creators should be in the driver’s seat.

After hundreds of conversations with news organizations, publishers, and large-scale social media platforms, we heard a consistent desire for a third path: They’d like to allow AI crawlers to access their content, but they’d like to get compensated. Currently, that requires knowing the right individual and striking a one-off deal, which is an insurmountable challenge if you don’t have scale and leverage.

What if I could charge a crawler?

We believe your choice need not be binary — there should be a third, more nuanced option: You can charge for access. Instead of a blanket block or uncompensated open access, we want to empower content owners to monetize their content at Internet scale.

We’re excited to help dust off a mostly forgotten piece of the web: HTTP response code 402.

Introducing pay per crawl

Pay per crawl, in private beta, is our first experiment in this area.

Pay per crawl integrates with existing web infrastructure, leveraging HTTP status codes and established authentication mechanisms to create a framework for paid content access.

Each time an AI crawler requests content, they either present payment intent via request headers for successful access (HTTP response code 200), or receive a 402 Payment Required response with pricing. Cloudflare acts as the Merchant of Record for pay per crawl and also provides the underlying technical infrastructure.

Publisher controls and pricing

Pay per crawl grants domain owners full control over their monetization strategy. They can define a flat, per-request price across their entire site. Publishers will then have three distinct options for a crawler:

Allow: Grant the crawler free access to content.
Charge: Require payment at the configured, domain-wide price.
Block: Deny access entirely, with no option to pay.

An important mechanism here is that even if a crawler doesn’t have a billing relationship with Cloudflare, and thus couldn’t be charged for access, a publisher can still choose to ‘charge’ them. This is the functional equivalent of a network level block (an HTTP 403 Forbidden response where no content is returned) — but with the added benefit of telling the crawler there could be a relationship in the future.

While publishers currently can define a flat price across their entire site, they retain the flexibility to bypass charges for specific crawlers as needed. This is particularly helpful if you want to allow a certain crawler through for free, or if you want to negotiate and execute a content partnership outside the pay per crawl feature.

To ensure integration with each publisher’s existing security posture, Cloudflare enforces Allow or Charge decisions via a rules engine that operates only after existing WAF policies and bot management or bot blocking features have been applied.

Payment headers and access

As we were building the system, we knew we had to solve an incredibly important technical challenge: ensuring we could charge a specific crawler, but prevent anyone from spoofing that crawler. Thankfully, there’s a way to do this using Web Bot Auth proposals.

For crawlers, this involves:

Generating an Ed25519 key pair, and making the JWK-formatted public key available in a hosted directory
Registering with Cloudflare to provide the URL of your key directory and user agent information.
Configuring your crawler to use HTTP Message Signatures with each request.

Once registration is accepted, crawler requests should always include signature-agent, signature-input, and signature headers to identify your crawler and discover paid resources.

GET /example.html
Signature-Agent: "https://signature-agent.example.com"
Signature-Input: sig2=("@authority" "signature-agent")
 ;created=1735689600
 ;keyid="poqkLGiymh_W0uP6PZFw-dvez3QJT5SolqXBCW38r0U"
 ;alg="ed25519"
 ;expires=1735693200
;nonce="e8N7S2MFd/qrd6T2R3tdfAuuANngKI7LFtKYI/vowzk4lAZYadIX6wW25MwG7DCT9RUKAJ0qVkU0mEeLElW1qg=="
 ;tag="web-bot-auth"
Signature: sig2=:jdq0SqOwHdyHr9+r5jw3iYZH6aNGKijYp/EstF4RQTQdi5N5YYKrD+mCT1HA1nZDsi6nJKuHxUi/5Syp3rLWBA==:

Accessing paid content

Once a crawler is set up, determination of whether content requires payment can happen via two flows:

Reactive (discovery-first)

Should a crawler request a paid URL, Cloudflare returns an HTTP 402 Payment Required response, accompanied by a crawler-price header. This signals that payment is required for the requested resource.

HTTP 402 Payment Required
crawler-price: USD XX.XX

The crawler can then decide to retry the request, this time including a crawler-exact-price header to indicate agreement to pay the configured price.

GET /example.html
crawler-exact-price: USD XX.XX

Proactive (intent-first)

Alternatively, a crawler can preemptively include a crawler-max-price header in its initial request.

GET /example.html
crawler-max-price: USD XX.XX

If the price configured for a resource is equal to or below this specified limit, the request proceeds, and the content is served with a successful HTTP 200 OK response, confirming the charge:

HTTP 200 OK
crawler-charged: USD XX.XX 
server: cloudflare

If the amount in a crawler-max-price request is greater than the content owner’s configured price, only the configured price is charged. However, if the resource’s configured price exceeds the maximum price offered by the crawler, an HTTP 402 Payment Required response is returned, indicating the specified cost. Only a single price declaration header, crawler-exact-price or crawler-max-price, may be used per request.

The crawler-exact-price or crawler-max-price headers explicitly declare the crawler's willingness to pay. If all checks pass, the content is served, and the crawl event is logged. If any aspect of the request is invalid, the edge returns an HTTP 402 Payment Required response.

Financial settlement

Crawler operators and content owners must configure pay per crawl payment details in their Cloudflare account. Billing events are recorded each time a crawler makes an authenticated request with payment intent and receives an HTTP 200-level response with a crawler-charged header. Cloudflare then aggregates all the events, charges the crawler, and distributes the earnings to the publisher.

Content for crawlers today, agents tomorrow

At its core, pay per crawl begins a technical shift in how content is controlled online. By providing creators with a robust, programmatic mechanism for valuing and controlling their digital assets, we empower them to continue creating the rich, diverse content that makes the Internet invaluable.

We expect pay per crawl to evolve significantly. It’s very early: we believe many different types of interactions and marketplaces can and should develop simultaneously. We are excited to support these various efforts and open standards.

For example, a publisher or new organization might want to charge different rates for different paths or content types. How do you introduce dynamic pricing based not only upon demand, but also how many users your AI application has? How do you introduce granular licenses at internet scale, whether for training, inference, search, or something entirely new?

The true potential of pay per crawl may emerge in an agentic world. What if an agentic paywall could operate entirely programmatically? Imagine asking your favorite deep research program to help you synthesize the latest cancer research or a legal brief, or just help you find the best restaurant in Soho — and then giving that agent a budget to spend to acquire the best and most relevant content. By anchoring our first solution on HTTP response code 402, we enable a future where intelligent agents can programmatically negotiate access to digital resources.

Getting started

Pay per crawl is currently in private beta. We’d love to hear from you if you’re either a crawler interested in paying to access content or a content creator interested in charging for access. You can reach out to us at http://www.cloudflare.com/paypercrawl-signup/ or contact your Account Executive if you’re an existing Enterprise customer.

First-party tags in seconds: Cloudflare integrates Google tag gateway for advertisers

Will Allen — Thu, 08 May 2025 18:15:00 GMT

If you’re a marketer, advertiser, or a business owner that runs your own website, there’s a good chance you’ve used Google tags in order to collect analytics or measure conversions. A Google tag is a single piece of code you can use across your entire website to send events to multiple destinations like Google Analytics and Google Ads.

Historically, the common way to deploy a Google tag meant serving the JavaScript payload directly from Google’s domain. This can work quite well, but can sometimes impact performance and accurate data measurement. That’s why Google developed a way to deploy a Google tag using your own first-party infrastructure using server-side tagging. However, this server-side tagging required deploying and maintaining a separate server, which comes with a cost and requires maintenance.

That’s why we’re excited to be Google’s launch partner and announce our direct integration of Google tag gateway for advertisers, providing many of the same performance and accuracy benefits of server-side tagging without the overhead of maintaining a separate server.

Any domain proxied through Cloudflare can now serve your Google tags directly from that domain. This allows you to get better measurement signals for your website and can enhance your campaign performance, with early testers seeing on average an 11% uplift in data signals. The setup only requires a few clicks — if you already have a Google tag snippet on the page, no changes to that tag are required.

Oh, did we mention it’s free? We’ve heard great feedback from customers who participated in a closed beta, and we are excited to open it up to all customers on any Cloudflare plan today.

Combining Cloudflare’s security and performance infrastructure with Google tag’s ease of use

Google Tag Manager is the most used tag management solution: it makes a complex tagging ecosystem easy to use and requires less effort from web developers. That’s why we’re collaborating with the Ads measurement and analytics teams at Google to make the integration with Google tag gateway for advertisers as seamless and accessible as possible.

Site owners will have two options of where to enable this feature: in the Google tag console, or via the Cloudflare dashboard. When logging into the Google tag console, you’ll see an option to enable Google tag gateway for advertisers in the Admin settings tab.

Alternatively, if you already know your tag ID and have admin access to your site’s Cloudflare account, you can enable the feature, edit the measurement ID and path directly from the Cloudflare dashboard:

Improved performance and measurement accuracy

Before, if site owners wanted to serve first-party tags from their own domain, they had to set up a complex configuration: create a CNAME entry for a new subdomain, create an Origin Rule to forward requests, and a Transform Rule to include geolocation information.

This new integration dramatically simplifies the setup and makes it a one-click integration by leveraging Cloudflare's position as a reverse proxy for your domain.

In Google Tag Manager’s Admin settings, you can now connect your Cloudflare account and configure your measurement ID directly in Google, and it will push your config to Cloudflare.

When you enable the Google tag gateway for advertisers, specific calls to Google’s measurement servers from your website are intercepted and re-routed through your domain. The result: instead of the browser directly requesting the tag script from a Google domain (e.g. www.googletagmanager.com), the request is routed seamlessly through your own domain (e.g. www.example.com/metrics).

Cloudflare acts as an intermediary for these requests. It first securely fetches the necessary Google tag JavaScript files from Google's servers in the background, then serves these scripts back to the end user's browser from your domain. This makes the request appear as a first-party request.

A bit more on how this works: When a browser requests https://example.com/gtag/js?id=G-XXXX, Cloudflare intercepts and rewrites the path into the original Google endpoint, preserving all query-string parameters and normalizing the Origin and Referer headers to match Google’s expectations. It then fetches the script on your behalf, and routes all subsequent measurement payloads through the same first-party proxy to the appropriate Google collection endpoints.

This setup also impacts how cookies are stored from your domain. A cookie is a small text file that a website asks your browser to store on your computer. When you visit other pages on that same website, or return later, your browser sends that cookie back to the website's server. This allows the site to remember information about you or your preferences, like whether a user is logged in, items in a shopping cart, or, in the case of analytics and advertising, an identifier to recognize your browser across visits.

With Cloudflare’s integration with Google tag gateway for advertisers, the tag script itself is delivered from your own domain. When this script instructs the browser to set a cookie, the cookie is created and stored under your website's domain.

How can I get started?

Detailed instructions to get started can be found here. You can also log in to your Cloudflare Dashboard, navigate to the Engagement Tab, and select Google tag gateway in the navigation to set it up directly in the Cloudflare dashboard.

Make your apps truly interactive with Cloudflare Realtime and RealtimeKit

Zaid Farooqui — Wed, 09 Apr 2025 14:05:00 GMT

Over the past few years, we’ve seen developers push the boundaries of what’s possible with real-time communication — tools for collaborative work, massive online watch parties, and interactive live classrooms are all exploding in popularity.

We use AI more and more in our daily lives. Text-based interactions are evolving into something more natural: voice and video. When users interact with the applications and tools that AI developers create, we have high expectations for response time and connection quality. Complex applications of AI are built on not just one tool, but a combination of tools, often from different providers which requires a well connected cloud to sit in the middle for the coordination of different AI tools.

Developers already use Workers, Workers AI, and our WebRTC SFU and TURN services to build powerful apps without needing to think about coordinating compute or media services to be closest to their user. It’s only natural for there to be a singular "Region: Earth" for real-time applications.

We're excited to introduce Cloudflare Realtime — a suite of products to help you make your apps truly interactive with real-time audio and video experiences. Cloudflare Realtime now brings together our SFU, STUN, and TURN services, along with the new RealtimeKit.

Say hello to RealtimeKit

RealtimeKit is a collection of mobile SDKs (iOS, Android, React Native, Flutter), SDKs for the Web (React, Angular, vanilla JS, WebComponents), and server side services (recording, coordination, transcription) that make it easier than ever to build real-time voice, video, and AI applications. RealtimeKit also includes user interface components to build interfaces quickly.

The amazing team behind Dyte, a leading company in the real-time ecosystem, joined Cloudflare to accelerate the development of RealtimeKit. The Dyte team spent years focused on making real-time experiences accessible to developers of all skill levels, and had a deep understanding of the developer journey — they built abstractions that hid WebRTC's complexity without removing its power.

Already a user of Cloudflare’s products, Dyte was a perfect complement to Cloudflare’s existing real-time infrastructure spanning 300+ cities worldwide. They built a developer experience layer that made complex media capabilities accessible. We’re incredibly excited for their team to join Cloudflare as we help developers define the future of user interaction for real-time applications as one team.

Interactive applications shouldn't require WebRTC expertise

For many developers, what starts as "let's add video chat" can quickly escalate into weeks of technical deep dives into WebSockets and WebRTC. While we are big believers in the potential of WebRTC, we also know that it comes with real challenges when building for the first time. Debugging WebRTC sessions can require developers to learn about esoteric new concepts such as navigating ICE candidate failures, TURN server configurations, and SDP negotiation issues.

The challenges of building a WebRTC app for the first time don’t stop there. Device management adds another layer of complexity. Inconsistent camera and microphone APIs across browsers and mobile platforms introduce unexpected behaviors in production. Chrome handles resolution switching one way, Safari another, and Android WebViews break in uniquely frustrating ways. We regularly see applications that function perfectly in testing environments fail mysteriously when deployed to certain devices or browsers.

Systems that work flawlessly with 5 test users collapse under the load of 50 real-world participants. Bandwidth adaptation falters, connection management becomes unwieldy, and maintaining consistent quality across diverse network conditions proves nearly impossible without specialized expertise.

What starts as a straightforward feature becomes a multi-month project requiring low-level engineering to solve problems that aren’t core to your business.

We realized that we needed to extend our products to client devices to help solve these problems.

RealtimeKit SDKs for Kotlin, React Native, Swift, JavaScript, Flutter

RealtimeKit is our toolkit for building real-time applications without common WebRTC headaches. The core of RealtimeKit is a set of cross-platform SDKs that handle all the low-level complexities, from session establishment and media permissions to NAT traversal and connection management. Instead of spending weeks implementing and debugging these foundations, you can focus entirely on creating unique experiences for your users.

Recording capabilities come built-in, eliminating one of the most commonly requested yet difficult-to-implement features in real-time applications. Whether you need to capture meetings for compliance, save virtual classroom sessions for students who couldn't attend live, or enable content creators to archive their streams, RealtimeKit handles the entire media pipeline. No more wrestling with MediaRecorder APIs or building custom recording infrastructure — it just works, scaling alongside your user base.

We've also integrated voice AI capabilities from providers like ElevenLabs directly into the platform. Adding AI participants to conversations becomes as simple as a function call, opening up entirely new interaction models. These AI voices operate with the same low latency as human participants — tens of milliseconds across our global network — creating truly synchronous experiences where AI and humans converse naturally. Combined with RealtimeKit's ability to scale to millions of concurrent participants, this enables entirely new categories of applications that weren't feasible before.

The Developer Experience

RealtimeKit focuses on what developers want to accomplish, rather than how the underlying protocols work. Adding participants or turning on recording are just an API call away. SDKs handle device enumeration, permission requests, and UI rendering across platforms. Behind the scenes, we’re solving the thorny problems of media orchestration and state management that can be challenging to debug.

We’ve been quietly working towards launching the Cloudflare RealtimeKit for years. From the very beginning, our global network has been optimized for minimizing latency between our network and end users, which is where the majority of network disruptions are introduced.

We developed a Selective Forwarding Unit (SFU) that intelligently routes media streams between participants, dynamically adjusting quality based on network conditions. Our TURN infrastructure solves the complex problem of NAT traversal, allowing connections to be established reliably behind firewalls. With Workers AI, we brought inference capabilities to the edge, minimizing latency for AI-powered interactions. Workers and Durable Objects provided the WebSockets coordination layer necessary for maintaining consistent state across participants.

SFU and TURN services are now Generally Available

We’re also announcing the General Availability of our SFU and TURN services for WebRTC developers that need more control and a low-level integration with the Cloudflare network.

SFU now supports simulcast, a very common feature request. Simulcast allows developers to select media streams from multiple options, similar to selecting the quality level of an online video, but for WebRTC. Users with different network qualities are now able to receive different levels of quality, either automatically defined by the SFU or manually selected.

Our TURN service now offers advanced analytics with insight into regional, country, and city level usage metrics. Together with Custom Identifiers, and revocable tokens, Cloudflare’s TURN service offers an in-depth view into usage and helps avoid abuse.

Our SFU and TURN products continue to be one of the most affordable ways to build WebRTC apps at scale, at 5 cents per GB after 1,000 GB of free usage each month.

Partnering with Hugging Face to make realtime AI communication seamless

FastRTC is a lightweight Python library from Hugging Face that makes it easy to stream real-time audio and video into and out of AI models using WebRTC. TURN servers are a critical part of WebRTC infrastructure and ensure that media streams can reliably connect across firewalls and NATs. For users of FastRTC, setting up a globally distributed TURN server can be complex and expensive.

Through our new partnership with Hugging Face, FastRTC users now have free access to Cloudflare’s TURN Server product, giving them reliable connectivity out of the box. Developers get 10 GB of TURN bandwidth each month using just a Hugging Face access token — no setup, no credit card, no servers to manage. As projects grow, they can easily switch to a Cloudflare account for more capacity and a larger free tier.

This integration allows AI developers to focus on building voice interfaces, video pipelines, and multimodal apps without worrying about NAT traversal or network reliability. FastRTC simplifies the code, and Cloudflare ensures it works everywhere. See these demos to get started.

Ship AI-powered realtime apps in days, not weeks

With RealtimeKit, developers can now implement complex real-time experiences in hours. The SDKs abstract away the most time-consuming aspects of WebRTC development while providing APIs tailored to common implementation patterns. Here are a few of the possibilities:

Video conferencing: Add multi-participant video calls to your application with just a few lines of code. RealtimeKit handles the connection management, bandwidth adaptation, and device permissions that typically consume weeks of development time.
Live streaming: Build interactive broadcasts where hosts can stream to thousands of viewers while selectively bringing participants on-screen. The SFU automatically optimizes media routing based on participant roles and network conditions.
Real-time synchronization: Implement watch parties or collaborative viewing experiences where content playback stays synchronized across all participants. The timing API handles the complex delay calculations and adjustments traditionally required.
Voice AI integrations: Add transcription and AI voice participants without building custom media pipelines. RealtimeKit's media processing APIs integrate with your existing authentication and storage systems rather than requiring separate infrastructure.

When we’ve seen our early testers use the RealtimeKit, it doesn't just accelerate their existing projects, it fundamentally changes which projects become viable.

Get started with RealtimeKit

Starting today, you'll notice a new Realtime section in your Cloudflare Dashboard. This section includes our TURN and SFU products alongside our latest product, RealtimeKit.

RealtimeKit is currently in a closed beta ready for select customers to start kicking the tires. There is currently no cost to test it out during the beta. Request early access here or via the link in your Cloudflare dashboard. We can’t wait to see what you build.

Preserving content provenance by integrating Content Credentials into Cloudflare Images

Will Allen — Mon, 03 Feb 2025 14:00:00 GMT

Today, we are thrilled to announce the integration of the Coalition for Content Provenance and Authenticity (C2PA) provenance standard into Cloudflare Images. Content creators and publishers can seamlessly preserve the entire provenance chain — from how an image was created and by whom, to every subsequent edit — across the Cloudflare network.

What is the C2PA and the Content Authenticity Initiative?

When you hear the word provenance, you might have flashbacks to your high school Art History class. In that context, it means that the artwork you see at the Met in New York really came from the artist in question and isn’t a fake. Its provenance is how that piece of physical art changed possession over time, from the original artist all the way to the museum.

Digital content provenance builds upon this concept. It helps you understand how a piece of digital media — images, videos, PDFs, and more — was created and subsequently edited. The provenance of a photo I posted on Instagram might look like this: I took the picture with my iPhone, performed an auto-magic edit using Apple Photos’ editing tools, uploaded it to Instagram, cropped it using Instagram’s editing tools, and then posted it.

Why does digital content provenance matter? At a fundamental level, it’s an important way to give content creators credit for their work. Many photographers have had the experience of seeing their photograph or video go viral online, but with their name and attribution stripped away. In that scenario, the opportunities that might have accrued to the creator once the world saw their work don’t materialize. If you help ensure an artist or content creator gets credit for their work, that exposure could result in more career opportunities.

Digital content provenance can also be an important tool in understanding the world around us. If you see a video or a photo of a newsworthy event, you’d like to know if that photo was really taken at that particular location, or if it was from years prior at a different location. If you see a grainy picture of a UFO flying over New Jersey, knowing when and where that photo was taken is helpful information in understanding what is actually happening.

The C2PA is a project of the non-profit Joint Development Foundation and has developed technical specifications for attaching digital content provenance to a piece of media. The standards also specify how to cryptographically sign that manifest, thereby allowing anyone to verify that the manifest hasn't been tampered with. The JSON manifests and the associated signatures are together referred to as Content Credentials.

The Adobe-led Content Authenticity Initiative, which has thousands of members across a variety of industries, aims to drive global adoption of Content Credentials.

Why integrate Content Credentials into Cloudflare Images?

Cloudflare Images allows you to build an effortlessly scalable and cost-effective image pipeline. With our new Content Credentials integration, you can now preserve existing Content Credentials, ensuring they remain intact from creation all the way to end-user delivery.

Many media organizations across the globe, such as the BBC, the New York Times, and Dow Jones, are members of the Content Authenticity Initiative. Imagine one of these news organizations wanted to include the Content Credentials of their photojournalist’s photos and allow anyone to verify the provenance of that image. Before now, even if the news organization was using a C2PA-compliant camera and editing flow, these credentials would frequently be stripped if the image was transformed by their CDN.

If you use Cloudflare, that is now a solved problem. In Cloudflare Images, you can now preserve Content Credentials when transforming images from remote sources. Enabling this integration will retain any existing Content Credentials that are embedded in the image.

When you use Images to resize or change the file format to your images, these transformations will be cryptographically signed by Cloudflare. This ensures, for example, that the end-user who sees the photograph on your website can use an open-source verification service such as contentcredentials.org/verify to verify the full provenance chain.

How it works

Imagine you are a photojournalist using a Nikon camera that has C2PA-compliant signing. That photojournalist could opt to attach Content Credentials to their photo, identifying the key elements of the photograph such as the camera model, the original image size, and aperture settings.

Below is a simplified example of what a C2PA-compliant Content Credential for a photograph taken with that Nikon camera could look like.

Content Credentials are stored using JUMBF (JPEG Universal Metadata Box Format), which serves as a standardized container format for embedding metadata within files. You can think of it as an envelope system that packages together both the data about where a piece of digital content came from and how it changed, as well as the cryptographic signatures that can be used to verify that data.

The assertions, or facts about the content provenance, are typically written in JSON for a better developer experience. Note that this example deliberately simplifies the JUMBF box nesting and adds comments to make it easier to follow.

{
  "jumbf": {
    "c2pa.manifest": {
      "claim_generator": "Nikon Z9 Firmware v1.2",
      "assertions": [
        {
          "label": "c2pa.actions",
          "data": {
            "actions": [
              {
                "action": "c2pa.captured",
                "when": "2025-01-10T12:00:00Z",
                "softwareAgent": "Nikon Z9",
                "parameters": {
                  "captureDevice": "NIKON Z9",
                  "serialNumber": "7DX12345",
                  "exposure": "1/250",
                  "aperture": "f/2.8",
                  "iso": 100,
                  "focalLength": "70mm"
                }
              }
            ]
          }
        }
      ],
      "signature_info": {
        "issuer": "Nikon",
        "time": "2025-01-10T12:00:00Z",
        "cert_fingerprint": "01234567890abcdef"
      },
      "claim_metadata": {
        "claim_id": "nikon_z9_123"
      }
    }
  }
}

Now imagine that you want to use this photograph on your website.

If you’ve enabled the Preserve Content Credentials setting in Cloudflare, then that metadata is now preserved in Cloudflare Images.

If you use Cloudflare Images to dynamically resize or transform this image, then Cloudflare automatically appends and cryptographically signs any additional actions in that same manifest. Below we show what the new Content Credentials could look like.

{
  "jumbf": {
    // Original Nikon manifest
    "c2pa.manifest.nikon": {
  /*unchanged*/      
},

    // New Cloudflare manifest
    "c2pa.manifest.cloudflare": {
      "claim_generator": "Cloudflare Images",
      "assertions": [
        {
          "label": "c2pa.actions",
          "data": {
            "actions": [
              {
                "action": "c2pa.resized",
                "when": "2025-01-10T12:05:00Z",
                "softwareAgent": "Cloudflare Images",
                "parameters": {
                  "originalDimensions": {
                    "width": 8256,
                    "height": 5504
                  },
                  "newDimensions": {
                    "width": 800,
                    "height": 533
                  }
                }
              }
            ]
          }
        }
      ],
      "signature_info": {
        "issuer": "Cloudflare, Inc",
        "time": "2025-01-10T12:05:00Z",
        "cert_fingerprint": "fedcba9876543210"
      },
      "claim_metadata": {
        "claim_id": "cf_resize_123",
        "parent_claim_id": "nikon_z9_123"
      }
    }
  }
}

In this example, the c2pa.action.resized entry describes a non-destructive transformation from one set of dimensions to another. This is included as a separate, independent assertion about this particular photograph.

Notice how there are two cryptographic signatures in this manifest, each referenced by signature_info. Since there were two entities involved in this example image — Nikon for the image’s creation, then Cloudflare for resizing it — both Nikon and Cloudflare independently signed their respective assertions about the content provenance.

In this example, the signature reference looks like this:

"signature_info": {
        "issuer": "Cloudflare, Inc",
        "time": "2025-01-10T12:05:00Z",
        "cert_fingerprint": "fedcba9876543210"

During the creation, editing, and resizing process of a piece of digital content, a unique hash of metadata is created for each action and then signed using a private key. The signature, along with the signer’s public certificate or reference to it, are contained in the JUMBF container as referenced by this JSON.

These hashes and signatures allow any open source verification tool to recalculate the hash, validate it against the signature, and check the certificate chain to ensure trustworthiness for each action taken on the image. This is what is meant by Content Credentials being tamper-evident: if any of these hashes and signatures fail to validate, it means that the metadata has been tampered with.

Each cryptographic signature is part of a Trust List, allowing anyone to verify the provenance chain across various entities, such as from a camera manufacturer to photo editing software to distribution across Cloudflare. More from the Content Authenticity Initiative:

Trust lists connect the end-entity certificate that signed a manifest back to the originating root CA. This is accomplished by supplying the subordinate public X.509 certificates forming the trust chain (the public X.509 certificate chain).

In order for Cloudflare to append the Content Credentials with any transformations, we needed to have a publicly available end-entity certificate and join this Trust List. Here we used DigiCert for our end-entity certificate and reference this certificate in the JSON manifests that we are now creating in production:

"signature_info": {
        "alg": "sha256",
        "issuer": "Cloudflare, Inc",
        "cert_serial_number": "073E9F61ADE599BE128B02EDC5BD2BDE",
        "time": "2024-01-06T22:42:36+00:00"
      },

The end result is that news organizations, journalists, and content companies can now create an auditable chain of digital provenance whose claims can be verified using public-key cryptography.

Let’s see an example

Earlier this year, OpenAI announced their support for including Content Credentials in DALL-E. I recently created an image in DALL-E (with ski season on my mind).

Cloudflare Images allows you to transform any image by URL. You can do so by simply changing the URL structure using this syntax:

https:///cdn-cgi/image//

We can break down each of these parameters:

ZONE is your particular domain.
cdn-cgi/image is a fixed prefix that identifies that this is a special path handled by a built-in Worker.
The OPTIONS parameter allows you to then transform the image — rotating it, changing the width, compressing it, and more.
SOURCE-IMAGE is the URL where your image is currently hosted.

To tie these together, I then have a new URL structure where I want to change the width and quality of the image I created in DALL-E and display this on my personal website. After uploading the image from DALL-E to one of my R2 buckets, I can create this URL:

https://williamallen.com/cdn-cgi/image/width=1000,quality=75,format=webp/https://pub-3d2658f6f7004dc38a4dd6be147b6a86.r2.dev/dalle.webp

Anyone can now verify its provenance using the Content Credentials Verify tool to see the result. The provenance chain is fully intact, even after using the Cloudflare Images transformation shown above to resize the image.

There are numerous open source command line tools that allow you to explore the full details of the Content Credentials. The C2PA Tool is created and maintained by the Content Authenticity Initiative. You can read more about the tool here and view the source code for it on GitHub.

There are two ways to install the tool: through a pre-built binary executable, or using Cargo Binstall if you have already installed Rust. Once installed, the C2PA Tool uses this syntax in your command line:

c2patool [OPTIONS]  [COMMAND]

If I navigate to the link of the image in my browser and save it to my downloads folder on my Mac, then I simply need to use the command -d (short for -detailed) to see the full details of the JSON manifest. Of course, you should change yourusername to your actual Mac username.

c2patool /Users/yourusername/Downloads/dalle.webp -d

And if you wanted to output this to a JSON file that you can review in VSCode or Cursor, use this command instead:

c2patool /Users/yourusername/Downloads/dalle.webp -d > manifest.json

This allows you to not just trust, but verify, the details of the image transformation yourself.

How to start using Cloudflare Images with Content Credentials

It’s straightforward to start preserving Content Credentials. Log in to your Cloudflare dashboard and navigate to Images in the dashboard. From there, choose Transformations and choose a Zone where you want to enable this feature. Then toggle this option to on:

If the images you are transforming do not contain any Content Credentials, no action is taken. But if they do, we preserve those Content Credentials and attest to any transformations.

Looking ahead

We are excited to continue to partner with Adobe and many other organizations to extend support for preserving Content Credentials across our products and services. If you are interested in learning more, we’d love to hear from you: I’m @williamallen on X or on LinkedIn.

Bring multimodal real-time interaction to your AI applications with Cloudflare Calls

Will Allen — Fri, 20 Dec 2024 14:00:00 GMT

OpenAI announced support for WebRTC in their Realtime API on December 17, 2024. Combining their Realtime API with Cloudflare Calls allows you to build experiences that weren’t possible just a few days earlier.

Previously, interactions with audio and video AIs were largely single-player: only one person could be interacting with the AI unless you were in the same physical room. Now, applications built using Cloudflare Calls and OpenAI’s Realtime API can now support multiple users across the globe simultaneously seeing and interacting with a voice or video AI.

Have your AI join your video calls

Here’s what this means in practice: you can now invite ChatGPT to your next video meeting:

We built this into our Orange Meets demo app to serve as an inspiration for what is possible, but the opportunities are much broader.

In the not-too-distant future, every company could have a 'corporate AI' they invite to their internal meetings that is secure, private and has access to their company data. Imagine this sort of real-time audio and video interactions with your company’s AI:

"Hey ChatGPT, do we have any open Jira tickets about this?"

"Hey Company AI, who are the competitors in the space doing Y?"

"AI, is XYZ a big customer? How much more did they spend with us vs last year?"

There are similar opportunities if your application is built for consumers: broadcasts and global livestreams can become much more interactive. The murder mystery game in the video above is just one example: you could build your own to play live with your friends in different cities.

WebRTC vs. WebSockets

These interactive multimedia experiences are enabled by the industry adoption of WebRTC, which stands for Web Real-time Communication.

Many real-time product experiences have historically used Websockets instead of WebRTC. Websockets operate over a single, persistent TCP connection established between a client and server. This is useful for maintaining a data sync for text-based chat apps or maintaining the state of gameplay in your favorite video game. Cloudflare has extensive support for Websockets across our network as well as in our AI Gateway.

If you were building a chat application prior to WebSockets, you would likely have your client-side app poll the server every n seconds to see if there are new messages to be displayed. WebSockets eliminated this need for polling. Instead, the client and the server establish a persistent, long-running connection to send and receive messages.

However, once you have multiple users across geographies simultaneously interacting with voice and video, small delays in the data sync can become unacceptable product experiences. Imagine building an app that does real-time translation of audio. With WebSockets, you would need to chunk the audio input, so each chunk contains 100–500 milliseconds of audio. That chunking size, along with the head-of-line blocking, becomes the latency floor for your ability to deliver a real-time multimodal experience to your users.

WebRTC solves this problem by having native support for audio and video tracks over UDP-based channels directly between users, eliminating the need for chunking. This lets you stream audio and video data to an AI model from multiple users and receive audio and video data back from the AI model in real-time.

Realtime AI fanout using Cloudflare Calls

Historically, setting up the underlying infrastructure for WebRTC — servers for media routing, TURN relays, global availability — could be challenging.

Cloudflare Calls handles the entirety of this complexity for developers, allowing them to leverage WebRTC without needing to worry about servers, regions, or scaling. Cloudflare Calls works as a single mesh network that automatically connects each user to a server close to them. Calls can connect directly with other WebRTC-powered services such as OpenAI’s, letting you deliver the output with near-zero latency to hundreds or thousands of users.

Privacy and security also come standard: all video and audio traffic that passes through Cloudflare Calls is encrypted by default. In this particular demo, we take it a step further by creating a button that allows you to decide when to allow ChatGPT to listen and interact with the meeting participants, allowing you to be more granular and targeted in your privacy and security posture.

How we connected Cloudflare Calls to OpenAI’s Realtime API

Cloudflare Calls has three building blocks: Applications, Sessions, and Tracks:

“A Session in Cloudflare Calls correlates directly to a WebRTC PeerConnection. It represents the establishment of a communication channel between a client and the nearest Cloudflare data center, as determined by Cloudflare's anycast routing …
Within a Session, there can be one or more Tracks. … [which] align with the MediaStreamTrack concept, facilitating audio, video, or data transmission.”

To include ChatGPT in our video conferencing demo, we needed to add ChatGPT as a track in an ongoing session. To do this, we connected to the Realtime API in Orange Meets:

// Connect Cloudflare Calls sessions and tracks like a switchboard
async function connectHumanAndOpenAI(
	humanSessionId: string,
	openAiSessionId: string
) {
	const callsApiHeaders = {
		Authorization: `Bearer ${APP_TOKEN}`,
		'Content-Type': 'application/json',
	}
	// Pull OpenAI audio track to human's track
	await fetch(`${callsEndpoint}/sessions/${humanSessionId}/tracks/new`, {
		method: 'POST',
		headers: callsApiHeaders,
		body: JSON.stringify({
			tracks: [
				{
					location: 'remote',
					sessionId: openAiSessionId,
					trackName: 'ai-generated-voice',
					mid: '#user-mic',
				},
			],
		}),
	})
	// Pull human's audio track to OpenAI's track
	await fetch(`${callsEndpoint}/sessions/${openAiSessionId}/tracks/new`, {
		method: 'POST',
		headers: callsApiHeaders,
		body: JSON.stringify({
			tracks: [
				{
					location: 'remote',
					sessionId: humanSessionId,
					trackName: 'user-mic',
					mid: '#ai-generated-voice',
				},
			],
		}),
	})
}

This code sets up the bidirectional routing between the human’s session and ChatGPT, which would allow the humans to hear ChatGPT and ChatGPT to hear the humans.

You can review all the code for this demo app on GitHub.

Get started today

Give the Cloudflare Calls + OpenAI Realtime API demo a try for yourself and review how it was built via the source code on GitHub. Then get started today with Cloudflare Calls to bring real-time, interactive AI to your apps and services.

Robotcop: enforcing your robots.txt policies and stopping bots before they reach your website

Celso Martinho — Tue, 10 Dec 2024 14:00:00 GMT

Cloudflare’s AI Crawl Control (formerly AI Audit) dashboard allows you to easily understand how AI companies and services access your content. AI Crawl Control gives a summary of request counts broken out by bot, detailed path summaries for more granular insights, and the ability to filter by categories like AI Search or AI Crawler.

Today, we're going one step further. You can now quickly see which AI services are honoring your robots.txt policies, which aren’t, and then programmatically enforce these policies.

What is robots.txt?

Robots.txt is a plain text file hosted on your domain that implements the Robots Exclusion Protocol, a standard that has been around since 1994. This file tells crawlers like Google, Bing, and many others which parts of your site, if any, they are allowed to access.

There are many reasons why site owners would want to define which portions of their websites crawlers are allowed to access: they might not want certain content available on search engines or social networks, they might trust one platform more than another, or they might simply want to reduce automated traffic to their servers.

With the advent of generative AI, AI services have started crawling the Internet to collect training data for their models. These models are often proprietary and commercial and are used to generate new content. Many content creators and publishers that want to exercise control over how their content is used have started using robots.txt to declare policies that cover these AI bots, in addition to the traditional search engines.

Here’s an abbreviated real-world example of the robots.txt policy from a top online news site:

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

This policy declares that the news site doesn't want ChatGPT, Anthropic AI, Google Gemini, or ByteDance’s Bytespider to crawl any of their content.

From voluntary compliance to enforcement

Compliance with the Robots Exclusion Protocol has historically been voluntary.

That’s where our new feature comes in. We’ve extended AI Crawl Control to give our customers both the visibility into how AI services providers honor their robots.txt policies and the ability to enforce those policies at the network level in your WAF.

Your robots.txt file declares your policy, but now we can help you enforce it. You might even call it … your Robotcop.

How it works

AI Crawl Control takes the robots.txt files from your web properties, parses them, and then matches their rules against the AI bot traffic we see for the selected property. The summary table gives you an aggregated view of the number of requests and violations we see for every Bot across all paths. If you hover your mouse over the Robots.txt column, we will show you the defined policies for each Bot in the tooltip. You can also filter by violations from the top of the page.

In the “Most popular paths” section, whenever a path in your site gets traffic that has violated your policy, we flag it for visibility. Ideally, you wouldn't see violations in the Robots.txt column — if you do see them, someone's not complying.

But that's not all… More importantly, AI Crawl Control allows you to enforce your robots.txt policy at the network level. By pressing the "Enforce robots.txt rules" button on the top of the summary table, we automatically translate the rules defined for AI Bots in your robots.txt into an advanced firewall rule, redirect you to the WAF configuration screen, and allow you to deploy the rule in our network.

This is how the robots.txt policy mentioned above looks after translation:

Once you deploy a WAF rule built from your robots.txt policies, you are no longer simply requesting that AI services respect your policy, you're enforcing it.

Conclusion

With AI Crawl Control, we are giving our customers even more visibility into how AI services access their content, helping them define their policies and then enforcing them at the network level.

This feature is live today for all Cloudflare customers. Simply log into the dashboard and navigate to your domain to begin auditing the bot traffic from AI services and enforcing your robots.txt directives.

The Cloudflare Blog

Introducing Markdown for Agents

Why markdown is important

About Us

Convert HTML to markdown, automatically

Content Signals Policy

Try it with the Cloudflare Blog & Developer Documentation

Other ways to convert to Markdown

Tracking markdown usage

Start using today

Human Native is joining Cloudflare

Human Native x Cloudflare

The Internet needs new economic models

New tools for AI developers

Building the foundation for these new business models

What’s next

Securing agentic commerce: helping AI Agents transact with Visa and Mastercard

The challenges with agentic commerce

Web Bot Auth is the foundation to securing agentic commerce

How it works: leveraging HTTP message signatures

What’s next: Cloudflare’s Agent SDK & Managed Rules

How to get started today

Giving users choice with Cloudflare’s new Content Signals Policy

What robots.txt does, and does not, do today

Why are we launching the Content Signals Policy now?

The Cloudflare Content Signals Policy

How to add content signals to your website

What’s next

Launching the x402 Foundation with Coinbase, and support for x402 transactions

Payments in the age of agents

A Primer on x402

Cloudflare’s pay per crawl: proposing the x402 deferred payment scheme

The Handshake Explained

1. The Server’s Offer

2. The Client's Signed Commitment

3. Successful Response

4. Payment Settlement

Cloudflare’s MCP servers, Agents SDK, and x402 payments

What’s next?

The next step for content creators in working with AI bots: Introducing AI Crawl Control

AI Crawl Control goes GA

Using HTTP 402 to help publishers license content to AI crawlers

How to customize your 402 status code with AI Crawl Control:

Beyond just blocking AI bots

Introducing pay per crawl: Enabling content owners to charge AI crawlers for access

A changing landscape of consumption

What if I could charge a crawler?

Introducing pay per crawl

Publisher controls and pricing

Payment headers and access

Accessing paid content

Reactive (discovery-first)

Proactive (intent-first)

Financial settlement

Content for crawlers today, agents tomorrow

Getting started

First-party tags in seconds: Cloudflare integrates Google tag gateway for advertisers

Combining Cloudflare’s security and performance infrastructure with Google tag’s ease of use

Improved performance and measurement accuracy

How can I get started?

Make your apps truly interactive with Cloudflare Realtime and RealtimeKit

Say hello to RealtimeKit

Interactive applications shouldn't require WebRTC expertise

RealtimeKit SDKs for Kotlin, React Native, Swift, JavaScript, Flutter

The Developer Experience

SFU and TURN services are now Generally Available

Partnering with Hugging Face to make realtime AI communication seamless

Ship AI-powered realtime apps in days, not weeks

Get started with RealtimeKit

Preserving content provenance by integrating Content Credentials into Cloudflare Images

What is the C2PA and the Content Authenticity Initiative?

Why integrate Content Credentials into Cloudflare Images?

How it works

Let’s see an example

How to start using Cloudflare Images with Content Credentials

Looking ahead

Bring multimodal real-time interaction to your AI applications with Cloudflare Calls

Have your AI join your video calls

WebRTC vs. WebSockets

Realtime AI fanout using Cloudflare Calls

`About Us`

What `robots.txt` does, and does not, do today