
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Thu, 30 Apr 2026 00:01:07 GMT</lastBuildDate>
        <item>
            <title><![CDATA[AI Search: the search primitive for your agents]]></title>
            <link>https://blog.cloudflare.com/ai-search-agent-primitive/</link>
            <pubDate>Thu, 16 Apr 2026 13:00:22 GMT</pubDate>
            <description><![CDATA[ AI Search is the search primitive for your agents. Create instances dynamically, upload files, and search across instances with hybrid retrieval and relevance boosting. Just create a search instance, upload, and search.
 ]]></description>
            <content:encoded><![CDATA[ <p>Every <a href="https://www.cloudflare.com/learning/ai/what-is-agentic-ai/"><u>agent</u></a> needs search: Coding agents search millions of files across repos, or support agents search customer tickets and internal docs. The use cases are different, but the underlying problem is the same: get the right information to the model at the right time.</p><p>If you're building search yourself, you need a vector index, an indexing pipeline that parses and chunks your documents, and something to keep the index up to date when your data changes. If you also need keyword search, that's a separate index and fusion logic on top. And if each of your agents needs its own searchable context, you're setting all of that up per agent. </p><p><a href="https://developers.cloudflare.com/ai-search/"><u>AI Search</u></a> (formerly <a href="https://blog.cloudflare.com/introducing-autorag-on-cloudflare/"><u>AutoRAG</u></a>) is the plug-and-play search primitive you need. You can dynamically create instances, give it your data, and search — from a Worker, the Agents SDK, or Wrangler CLI. Here's what we're shipping:</p><ul><li><p><b>Hybrid search</b>. Enable both semantic and keyword matching in the same query. Vector search and BM25 run in parallel and results are fused. (The search on our blog is now powered by AI Search. <i>Try the magnifying glass icon to the top right.</i>)</p></li><li><p><b>Built-in storage and index.</b> New instances come with their own storage and vector index. Upload files directly to an instance via API and they're indexed. No R2 buckets to set up, no external data sources to connect first. The new <code>ai_search_namespaces</code> binding lets you create and delete instances at runtime from your Worker, so you can spin up one per agent, per customer, or per language without redeployment.</p></li></ul><p>You can now also attach metadata to documents and use it to boost rankings at query time, and query across multiple instances in a single call.<b> </b></p><p>Now, let's look at what this means in practice.</p>
    <div>
      <h2>In action: Customer Support Agent</h2>
      <a href="#in-action-customer-support-agent">
        
      </a>
    </div>
    <p>Let's walk through a support agent that searches for two kinds of knowledge: shared product docs, and per-customer history like past resolutions. The product docs are too large to fit in a context window, and each customer's history grows with every resolved issue, so the agent needs retrieval to find what's relevant.</p><p>Here's what that looks like with AI Search and the <a href="https://developers.cloudflare.com/agents"><u>Agents SDK</u></a>. Start by scaffolding a project:</p>
            <pre><code>npm create cloudflare@latest -- --template cloudflare/agents-starter
</code></pre>
            <p>First, bind an AI Search namespace to your Worker:</p>
            <pre><code>// wrangler.jsonc 
{
  "ai_search_namespaces": [
    { "binding": "SUPPORT_KB", "namespace": "support" }
  ],
  "ai": { "binding": "AI" },
  "durable_objects": {
    "bindings": [
      { "name": "SupportAgent", "class_name": "SupportAgent" }
    ]
  }
}
</code></pre>
            <p>Let's say your shared product documentation lives in an R2 bucket called <code>product-doc</code>. You can create a one-off AI Search instance (named <code>product-knowledge</code>) backed by the bucket on the Cloudflare Dashboard within the <code>support</code> namespace:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1b8NdFL2HDBy8FqBHEI679/f17ed98d45fb9b42a616e0b464460489/BLOG-3240_2.png" />
          </figure><p>That's your shared knowledge base, the docs every agent can reference.</p><p>When a customer comes back with a new issue, knowing what's already been tried saves everyone time. You can track this by creating an AI Search instance per customer. After each resolved issue, the agent saves a summary of what went wrong and how it was fixed. Over time, this builds up a searchable log of past resolutions. You can create instances dynamically using the namespace binding:</p>
            <pre><code>// create a per-customer instance when they first show up 
await env.SUPPORT_KB.create({
  id: `customer-${customerId}`,
  index_method:{ keyword: true, vector: true }
});
</code></pre>
            <p>Each instance gets its own built-in storage and vector index — powered by <a href="https://www.cloudflare.com/developer-platform/products/r2/"><u>R2</u></a> and <a href="https://www.cloudflare.com/developer-platform/products/vectorize/"><u>Vectorize</u></a>. The instance starts empty and accumulates context over time. Next time the customer comes back, all of it is searchable.</p><p>Here's what the namespace looks like after a few customers:</p>
            <pre><code>namespace: "support"
├── product-knowledge     (R2 as source, shared across all agents)
├── customer-abc123       (managed storage, per-customer)
├── customer-def456       (managed storage, per-customer)
└── customer-ghi789       (managed storage, per-customer)

</code></pre>
            <p>Now the agent itself. It extends <code>AIChatAgent</code> from the Agents SDK and defines two tools. We're using <a href="https://blog.cloudflare.com/workers-ai-large-models/"><u>Kimi K2.5</u></a> as the LLM via <a href="https://www.cloudflare.com/developer-platform/products/workers-ai/"><u>Workers AI</u></a>. The model decides when to call the tools based on the conversation:</p>
            <pre><code>import { AIChatAgent, type OnChatMessageOptions } from "@cloudflare/ai-chat";
import { createWorkersAI } from "workers-ai-provider";
import { streamText, convertToModelMessages, tool, stepCountIs } from "ai";
import { routeAgentRequest } from "agents";
import { z } from "zod";

export class SupportAgent extends AIChatAgent&lt;Env&gt; {
  async onChatMessage(_onFinish: unknown, options?: OnChatMessageOptions) {
    // the client passes customerId in the request body
    // via the Agent SDK's sendMessage({ body: { customerId } })
    const customerId = options?.body?.customerId;

    // create a per-customer instance when they first show up.
    // each instance gets its own storage and vector index.
    if (customerId) {
      try {
        await this.env.SUPPORT_KB.create({
          id: `customer-${customerId}`,
          index_method: { keyword: true, vector: true }
        });
      } catch {
        // instance already exists
      }
    }

    const workersai = createWorkersAI({ binding: this.env.AI });

    const result = streamText({
      model: workersai("@cf/moonshotai/kimi-k2.5"),
      system: `You are a support agent. Use search_knowledge_base
        to find relevant docs before answering. Search results
        include both product docs and this customer's past
        resolutions — use them to avoid repeating failed fixes
        and to recognize recurring issues. When the issue is
        resolved, call save_resolution before responding.`,
      // this.messages is the full conversation history, automatically
      // persisted by AIChatAgent across reconnects
      messages: await convertToModelMessages(this.messages),
      tools: {
        // tool 1: search across shared product docs AND this
        // customer's past resolutions in a single call
        search_knowledge_base: tool({
          description: "Search product docs and customer history",
          inputSchema: z.object({
            query: z.string().describe("The search query"),
          }),
          execute: async ({ query }) =&gt; {
            // always search product docs;
            // include customer history if available
            const instances = ["product-knowledge"];
            if (customerId) {
              instances.push(`customer-${customerId}`);
            }
            return await this.env.SUPPORT_KB.search({
              query: query,
              ai_search_options: {
                // surface recent docs over older ones
                boost_by: [
                  { field: "timestamp", direction: "desc" }
                ],
                // search across both instances at once
                instance_ids: instances
              }
            });
          }
        }),

        // tool 2: after resolving an issue, the agent saves a
        // summary so future agents have full context
        save_resolution: tool({
          description:
            "Save a resolution summary after solving a customer's issue",
          inputSchema: z.object({
            filename: z.string().describe(
              "Short descriptive filename, e.g. 'billing-fix.md'"
            ),
            content: z.string().describe(
              "What the problem was, what caused it, and how it was resolved"
            ),
          }),
          execute: async ({ filename, content }) =&gt; {
            if (!customerId) return { error: "No customer ID" };
            const instance = this.env.SUPPORT_KB.get(
              `customer-${customerId}`
            );
            // uploadAndPoll waits until indexing is complete,
            // so the resolution is searchable before the next query
            const item = await instance.items.uploadAndPoll(
              filename, content
            );
            return { saved: true, filename, status: item.status };
          }
        }),
      },
      // cap agentic tool-use loops at 10 steps
      stopWhen: stepCountIs(10),
      abortSignal: options?.abortSignal,
    });

    return result.toUIMessageStreamResponse();
  }
}

// route requests to the SupportAgent durable object
export default {
  async fetch(request: Request, env: Env) {
    return (
      (await routeAgentRequest(request, env)) ||
      new Response("Not found", { status: 404 })
    );
  }
} satisfies ExportedHandler&lt;Env&gt;;
</code></pre>
            <p>With this, the model decides when to search and when to save. When it searches, it queries <code>product-knowledge</code> and this customer's past resolutions together. When the issue is resolved, it saves a summary that's immediately searchable in future conversations. </p>
    <div>
      <h2>How AI Search finds what you're looking for</h2>
      <a href="#how-ai-search-finds-what-youre-looking-for">
        
      </a>
    </div>
    <p>Under the hood, AI Search runs a multi-step retrieval pipeline, in which every step is configurable.</p>
    <div>
      <h3>Hybrid Search: search that understands intent and matches terms</h3>
      <a href="#hybrid-search-search-that-understands-intent-and-matches-terms">
        
      </a>
    </div>
    <p>Until now, AI Search only offered vector search. Vector search is great at understanding intent, but it can lose specifics. In a query "ERR_CONNECTION_REFUSED timeout," the embedding captures the broad concept of connection failures. But the user isn't looking for general networking docs. They're looking for the specific document that mentions “ERR_CONNECTION_REFUSED”. Vector search might return results about troubleshooting without ever surfacing the page that contains that exact error string. </p><p>Keyword search fills that gap. AI Search now supports BM25, one of the most widely used retrieval scoring functions. BM25 scores documents by how often your query terms appear, how rare those terms are across the entire corpus, and how long the document is. It rewards matches on specific terms, penalizes common filler words, and normalizes for document length. When you search "ERR_CONNECTION_REFUSED timeout", BM25 finds documents that actually contain "ERR_CONNECTION_REFUSED" as a term. However, BM25 may miss a page about “troubleshooting network connections” even though it may be describing the same problem. That's where vector search shines, and why you need both.</p><p>When you enable hybrid search, it runs vector and BM25 in parallel, fuses the results, and optionally reranks them:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/27CV8IBS2dYTV5puCtIPmD/3c66c190127fa38c4a4275425de8f9c4/BLOG-3240_3.png" />
          </figure><p>Let's take a look at the new configurations for BM25, and how they come together.</p><ol><li><p><b>Tokenizer </b>controls how your documents are broken into matchable terms at index time. Porter stemmer (option: <code>porter</code>) stems words so "running" matches "run." Trigram (option: <code>trigram</code>) matches character substrings so "conf" matches "configuration." You can use porter for natural language content like docs, and trigram for code where partial matches matter.</p></li><li><p><b>Keyword match mode </b>controls which documents are candidates for BM25 scoring at query time. <code>AND</code> requires all query terms to appear in a document, OR includes anything with at least one match.</p></li><li><p><b>Fusion </b>controls how vector and keyword results are combined into the final list of results during query time. Reciprocal rank fusion (option: <code>rrf</code>) merges by rank position rather than score, which avoids comparing two incompatible scoring scales, whereas max fusion (option: <code>max</code>) takes the higher score.</p></li><li><p><b>(Optional) Reranking </b>adds a cross-encoder pass that re-scores results by evaluating the query and document together as a pair. It can help catch cases where a result has the right terms but isn't answering the question. </p></li></ol><p>Every option has a sane default when omitted. You have the flexibility to configure what matters whenever you create a new instance:</p>
            <pre><code>const instance = await env.AI_SEARCH.create({
  id: "my-instance",
  index_method: { keyword: true, vector: true },
  indexing_options: {
    keyword_tokenizer: "porter"
  },
  retrieval_options: {
    keyword_match_mode: "or"
  },
  fusion_method: "rrf",
  reranking: true,
  reranking_model: "@cf/baai/bge-reranker-base"
});
</code></pre>
            
    <div>
      <h3>Boost relevance: surface what matters</h3>
      <a href="#boost-relevance-surface-what-matters">
        
      </a>
    </div>
    <p>Retrieval gets you relevant results, but relevance alone isn't always enough. For example, in a news search, an article from last week and an article from three years ago might both be semantically relevant to "election results," but most users probably want the recent one. Boosting lets you layer business logic on top of retrieval by nudging rankings based on document metadata.</p><p>You can boost on timestamp (built in on every item) or any <a href="https://developers.cloudflare.com/ai-search/configuration/indexing/metadata/"><u>custom metadata field</u></a> you define.</p>
            <pre><code>// boost high priority docs
const results = await instance.search({
  query: "deployment guide",
  ai_search_options: {
    boost_by: [
      { field: "timestamp", direction: "desc" }
    ]
  }
});
</code></pre>
            
    <div>
      <h3>Cross-instance search: query across boundaries</h3>
      <a href="#cross-instance-search-query-across-boundaries">
        
      </a>
    </div>
    <p>In the support agent example, product documentation and customer resolution history live in separate instances by design. But when the agent is answering a question, it needs context from both places at once. Without cross-instance search, you'd make two separate calls and merge the results yourself.</p><p>The namespace binding exposes a <code>search()</code> method that handles this for you. Pass an array of instance names and get one ranked list back:</p>
            <pre><code>const results = await env.SUPPORT_KB.search({
  query: "billing error",
  ai_search_options: {
    instance_ids: ["product-knowledge", "customer-abc123"]
  }
});
</code></pre>
            <p>Results are merged and ranked across instances. The agent doesn't need to know or care that shared docs and customer resolution history live in separate places. </p>
    <div>
      <h2>How AI Search instances work</h2>
      <a href="#how-ai-search-instances-work">
        
      </a>
    </div>
    <p>So far we've covered how AI Search finds the right results. Now let's look at how you can create and manage your search instances.</p><p>If you used AI Search before this release, you know the setup: create an R2 bucket, link it to an AI Search instance, AI search generates a service API token for you, and you manage the Vectorize index that gets provisioned on your account. Uploading an object requires you to write to R2 and then wait for a sync job to run to have the object indexed.</p><p>New instances created now work differently. When you call <code>create()</code>, the instance comes with its own storage and vector index built-in. You can upload a file, the file is sent to index immediately, and you can poll for indexing status all with one <code>uploadAndpoll()</code> API. Once completed, you can search the instance immediately, and there are no external dependencies to wire together.</p>
            <pre><code>const instance = env.AI_SEARCH.get("my-instance");

// upload and wait for indexing to complete
const item = await instance.items.uploadAndPoll("faq.md", content, {
  metadata: { category: "onboarding" }
});
console.log(item.status); // "completed"

// immediately search after indexing is completed
const results = await instance.search({
  // alternative way to pass in users' query other than using parameter query 
  messages: [{ role: "user", content: "onboarding guide" }],
});
</code></pre>
            <p>Each instance can also connect to one external data source (an R2 bucket or a website) and run on a sync schedule. It can exist alongside the provided built-in storage. In the support agent example, <code>product-knowledge</code> is backed by an R2 bucket for shared documentation, while each customer's instance uses built-in storage for context uploaded on the fly.</p>
    <div>
      <h3>Namespaces: create search instances at runtime</h3>
      <a href="#namespaces-create-search-instances-at-runtime">
        
      </a>
    </div>
    <p>The <code>ai_search_namespaces</code> is a new binding you can leverage to dynamically create search instances at runtime. It replaces the previous <code>env.AI.autorag()</code> API, which accessed AI Search through the <code>AI</code> binding. The old bindings will continue to work using <a href="https://developers.cloudflare.com/workers/configuration/compatibility-dates/"><u>Workers compatibility dates</u></a>.</p>
            <pre><code>// wrangler.jsonc 
{
  "ai_search_namespaces": [
    { "binding": "AI_SEARCH", "namespace": "example" },
  ]
}
</code></pre>
            <p>The namespace binding gives you APIs like <code>create()</code>, <code>delete()</code>, <code>list()</code>, and <code>search()</code> at the namespace level. If you’re creating instances dynamically (e.g. per agent, per customer, per tenant), this is the binding to use.</p>
            <pre><code>// create an instance 
const instance = await env.AI_SEARCH.create({
  id: "my-instance"
});

// delete an instance and all its indexed data
await env.AI_SEARCH.delete("old-instance");
</code></pre>
            
    <div>
      <h3>Pricing for new instances</h3>
      <a href="#pricing-for-new-instances">
        
      </a>
    </div>
    <p>New instances created as of today will get built-in storage and a vector index automatically. </p><p>These instances are free to use while AI Search is in open beta with the limits listed below. When using the website as a data source, website crawling using <a href="https://developers.cloudflare.com/browser-rendering/"><u>Browser Run (formerly Browser Rendering)</u></a> is also now a built-in service, meaning that you won’t be billed for it separately. After beta, the goal is to provide unified pricing for AI Search as a single service, rather than billing separately for each underlying component. Workers AI and <a href="https://www.cloudflare.com/developer-platform/products/ai-gateway/"><u>AI Gateway</u></a> usage will continue to be billed separately.</p><p>We'll give at least 30 days notice and communicate pricing details before any billing begins.</p><table><tr><th><p><b>Limit</b></p></th><th><p><b>Workers Free</b></p></th><th><p><b>Workers Paid</b></p></th></tr><tr><td><p>AI Search instances per account</p></td><td><p>100</p></td><td><p>5,000</p></td></tr><tr><td><p>Files per instance</p></td><td><p>100,000</p></td><td><p>1M or 500K for hybrid search</p></td></tr><tr><td><p>Max file size</p></td><td><p>4MB</p></td><td><p>4MB</p></td></tr><tr><td><p>Queries per month</p></td><td><p>20,000</p></td><td><p>Unlimited</p></td></tr><tr><td><p>Maximum pages crawled per day</p></td><td><p>500</p></td><td><p>Unlimited</p></td></tr></table><p><i>What about existing instances?</i> </p><p>If you created instances before this release, they continue to work exactly as they do today. Your R2 buckets, Vectorize indexes, and Browser Run usage remain on your account and are billed as before. We'll share migration details for existing instances soon.</p>
    <div>
      <h2>Get started today</h2>
      <a href="#get-started-today">
        
      </a>
    </div>
    <p>Search is one of the most fundamental things an agent can do. With AI Search, you don't have to build the infrastructure to make it happen. Create an instance, give it your data, and let your agents search it.</p><p>Get started today by running this command to create your first instance:</p>
            <pre><code>npx wrangler ai-search create my-search
</code></pre>
            <p>Check out the <a href="https://developers.cloudflare.com/ai-search/"><u>docs</u></a> and come tell us what you're building on the <a href="https://discord.cloudflare.com/"><u>Cloudflare Developer Discord</u></a>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5Y5WLWBuK7NBMLmY6ZWL96/ce7ca954f4f51ac21f8e9d3f15d0343c/BLOG-3240_4.png" />
          </figure><p></p> ]]></content:encoded>
            <category><![CDATA[Agents Week]]></category>
            <category><![CDATA[Agents]]></category>
            <category><![CDATA[AI Search]]></category>
            <category><![CDATA[AI]]></category>
            <guid isPermaLink="false">4l8kYFerKsLkZH2ZVaOoYf</guid>
            <dc:creator>Gabriel Massadas</dc:creator>
            <dc:creator>Miguel Cardoso</dc:creator>
            <dc:creator>Anni Wang</dc:creator>
        </item>
        <item>
            <title><![CDATA[An AI Index for all our customers]]></title>
            <link>https://blog.cloudflare.com/an-ai-index-for-all-our-customers/</link>
            <pubDate>Fri, 26 Sep 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare will soon automatically create an AI-optimized search index for your domain, and expose a set of ready-to-use standard APIs and tools including an MCP server, LLMs.txt, and a search API. ]]></description>
            <content:encoded><![CDATA[ <p>Today, we’re announcing the <b>private beta</b> of <b>AI Index </b>for domains on Cloudflare, a new type of web index that gives content creators the tools to make their data discoverable by AI, and gives AI builders access to better data for fair compensation.</p><p>With AI Index enabled on your domain, we will automatically create an AI-optimized search index for your website, and expose a set of ready-to-use standard APIs and tools including an MCP server, LLMs.txt, and a search API. Our customers will own and control that index and how it’s used, and you will have the ability to monetize access through <a href="https://developers.cloudflare.com/ai-crawl-control/features/pay-per-crawl/what-is-pay-per-crawl/"><u>Pay per crawl</u></a> and the new <a href="https://blog.cloudflare.com/x402/"><u>x402 integrations</u></a>. You will be able to use it to build modern search experiences on your own site, and more importantly, interact with external AI and Agentic providers to make your content more discoverable while being fairly compensated.</p><p>For AI builders—whether developers creating agentic applications, or AI platform companies providing foundational LLM models—Cloudflare will offer a new way to discover and retrieve web content: direct <b>pub/sub connections</b> to individual websites with AI Index. Instead of indiscriminate crawling, builders will be able to subscribe to specific sites that have opted in for discovery, receive structured updates as soon as content changes, and pay fairly for each access. Access is always at the discretion of the site owner.</p><p>From the individual indexes, Cloudflare will also build an aggregated layer, the <b>Open Index</b>, that bundles together participating sites. Builders get a single place to search across collections or the broader web, while every site still retains control and can earn from participation. </p>
    <div>
      <h3>Why build an AI Index?</h3>
      <a href="#why-build-an-ai-index">
        
      </a>
    </div>
    <p>AI platforms are quickly becoming one of the main ways people discover information online. Whether asking a chatbot to summarize a news article or find a product recommendation, the path to that answer almost always starts with crawling original content and indexing or using that data for training. However, today, that process is largely controlled by platforms: what gets crawled, how often, and whether the site owner has any input in the matter.</p><p>Although Cloudflare now offers to monitor and control how AI services respect your access policies and how they access your content, it's still challenging to make new content visible. Content creators have no efficient way to signal to AI builders when a page is published or updated. On the other hand, for AI builders, crawling and recrawling unstructured content is costly, wastes resources, especially when you don’t know the quality and cost in advance.</p><p>We need a fairer and healthier ecosystem for content discovery and usage that bridges the gap between content creators and AI builders.</p>
    <div>
      <h3>How AI Index will work</h3>
      <a href="#how-ai-index-will-work">
        
      </a>
    </div>
    <p>When you onboard a domain to Cloudflare, or if you have an existing domain on Cloudflare, you will have the choice to enable an AI Index. If enabled, we will automatically create an AI-optimized search index for your domain that you own and control.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3kV7Oru6D5jPWeGeWDQDsi/7d738250f24250cf98db2e96222319ec/image1.png" />
          </figure><p>As your site updates and grows, the index will evolve with it. New or updated pages will be processed in real-time using the same technology that powers Cloudflare <a href="https://developers.cloudflare.com/ai-search/"><u>AI Search (formerly AutoRAG)</u></a> and its <a href="https://developers.cloudflare.com/ai-search/configuration/data-source/website/"><u>Website</u></a> as a data source. Best of all, we will manage everything; you won't have to worry about each individual component of compute, storage resources, databases, embeddings, chunking, or AI models. Everything will happen behind the scenes, automatically.</p><p>Importantly, you will have control over what content to <b>include or exclude </b>from your website's index, and <b>who</b> can get access to your content via <b>AI</b> <b>Crawl Control</b>, ensuring that only the data you want to expose is made searchable and accessible. You also will be able to opt out of the AI Index completely; it will all be up to you.</p><p>When your AI Index is set up, you will get a set of ready-to-use APIs:                                                                                                                                                   </p><ul><li><p><b>An MCP Server: </b>Agentic applications will be able to connect directly to your site using the <a href="https://www.cloudflare.com/learning/ai/what-is-model-context-protocol-mcp/"><u>Model Context Protocol (MCP)</u></a>, making your content discoverable to agents in a standardized way. This includes support for <a href="https://developers.cloudflare.com/ai-search/how-to/nlweb/"><u>NLWeb</u></a> tools, an open project developed by Microsoft that defines a standard protocol for natural language queries on websites.</p></li><li><p><b>A flexible search API: </b>This endpoint will<b> </b>return relevant results in structured JSON. </p></li><li><p><b>LLMs.txt and LLMs-full.txt: </b>Standard files that provide LLMs with a machine-readable map of your site, following <a href="https://github.com/AnswerDotAI/llms-txt"><u>emerging open standards</u></a>. These will help models understand how to use your site’s content at inference time. An example of <a href="https://developers.cloudflare.com/llms.txt"><u>llms.txt</u></a> exists in the Cloudflare Developer Documentation.</p></li><li><p><b>A bulk data API: </b>An endpoint<b> </b>for transferring large amounts of content efficiently, available under the rules you set. Instead of querying for every document, AI providers will be able to ingest in one shot.</p></li><li><p><b>Pub-sub subscriptions: </b>AI platforms will be able to subscribe to your site’s index and receive events and content updates directly from Cloudflare in a structured format in real-time, making it easy for them to stay current without re-crawling.</p></li><li><p><b>Discoverability directives:</b> In robots.txt and well-known URIs to allow AI agents and crawlers visiting your site to discover and use the available API automatically.</p></li></ul>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4Hr3EhsMBH0oVwMVKywwre/2a01efbe03d67a8154123b63c05c000f/image3.png" />
          </figure><p>The index will integrate directly with <a href="https://developers.cloudflare.com/ai-crawl-control/"><u>AI Crawl Control</u></a>, so you will be able to see who’s accessing your content, set rules, and manage permissions. And with <a href="https://developers.cloudflare.com/ai-crawl-control/features/pay-per-crawl/what-is-pay-per-crawl/"><u>Pay per crawl</u></a> and <a href="https://blog.cloudflare.com/x402/"><u>x402 integrations</u></a>, you can choose to directly monetize access to your content. </p>
    <div>
      <h3>A feed of the web for AI builders</h3>
      <a href="#a-feed-of-the-web-for-ai-builders">
        
      </a>
    </div>
    <p>As an AI builder, you will be able to discover and subscribe to high-quality, permissioned web data through individual site’s AI indexes. Instead of sending crawlers blindly across the open Internet, you will connect via a pub/sub model: participating websites will expose structured updates whenever their content changes, and you will be able to subscribe to receive those updates in real-time. With this model, your new workflow may look something like this:</p><ol><li><p><b>Discover websites that have opted in: </b>Browse and filter through a directory of websites that make their indexes available through Cloudflare.</p></li><li><p><b>Evaluate content with metadata and metrics: </b>Get content metadata information on various metrics (e.g., uniqueness, depth, contextual relevance, popularity) before accessing it.</p></li><li><p><b>Pay fairly for access:</b> When content is valuable, platforms can compensate creators directly through Pay per crawl. These payments not only enable access but also support the continued creation of original content, helping to sustain a healthier ecosystem for discovery.</p></li><li><p><b>Subscribe to updates: </b>Use pub-sub subscriptions to receive events about changes made by the website, so you know when to retrieve or crawl for new content without wasting resources on constant re-crawling. </p></li></ol><p>By shifting from blind crawling to a permissioned pub/sub system for the web, AI builders save time, cut costs, and gain access to cleaner, high-quality data while content creators remain in control and are fairly compensated.</p>
    <div>
      <h3>The aggregated Open Index</h3>
      <a href="#the-aggregated-open-index">
        
      </a>
    </div>
    <p>Individual indexes provide AI platforms with the ability to access data directly from specific sites, allowing them to subscribe for updates, evaluate value, and pay for full content access on a per-site basis. But when builders need to work at a larger scale, managing dozens or hundreds of separate subscriptions can become complex. The <b>Open Index </b>will provide an additional option: a bundled, opt-in collection of those indexes, featuring sophisticated features such as quality, uniqueness, originality, and depth of content filters, all accessible in one place.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6rjkK5UCh9BLSqceUuG0RI/92413aed318baced0ee8812bec511cfb/image2.png" />
          </figure><p>The Open Index is designed to make content discovery at scale easier:</p><ul><li><p><b>Get unified access: </b>Query and retrieve data across many participating sites simultaneously. This reduces integration overhead and enables builders to plug into a curated collection of data, or use it as a ready-made web search layer that can be accessed at query time.</p></li><li><p><b>Discover broader scopes: </b>Work with topic-specific bundles (e.g., news, documentation, scientific research) or a general discovery index covering the broader web. This makes it simple to explore new content sources you may not have identified individually.</p></li><li><p><b>Bottom-up monetization: </b>Results still originate from an individual site’s AI index, with monetization flowing back to that site through Pay per crawl, helping preserve fairness and sustainability at scale.</p></li></ul><p>Together, per-site AI indexes and the Open Index will provide flexibility and precise control when you want full content from individual sites (i.e., for training, AI agents, or search experiences), and broad search coverage when you need a unified search across the web.</p>
    <div>
      <h3>How you can participate in the shift</h3>
      <a href="#how-you-can-participate-in-the-shift">
        
      </a>
    </div>
    <p>With AI Index and the Cloudflare Open Index, we’re creating a model where websites decide how their content is accessed, and AI builders receive structured, reliable data at scale to build a fairer and healthier ecosystem for content discovery and usage on the Internet.</p><p>We’re starting with a <b>private beta</b>. If you want to enroll your website into the AI Index or access the pub/sub web feed as an AI builder, you can <a href="https://www.cloudflare.com/aiindex-signup/"><b><u>sign up today</u></b></a>.</p> ]]></content:encoded>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[Pay Per Crawl]]></category>
            <category><![CDATA[AI Search]]></category>
            <category><![CDATA[MCP]]></category>
            <guid isPermaLink="false">7rcW6x4j6v7O6ZEHir5fmK</guid>
            <dc:creator>Celso Martinho</dc:creator>
            <dc:creator>Anni Wang</dc:creator>
        </item>
    </channel>
</rss>