
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Fri, 10 Apr 2026 00:53:33 GMT</lastBuildDate>
        <item>
            <title><![CDATA[Deploy your own AI vibe coding platform — in one click! ]]></title>
            <link>https://blog.cloudflare.com/deploy-your-own-ai-vibe-coding-platform/</link>
            <pubDate>Tue, 23 Sep 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Introducing VibeSDK, an open-source AI "vibe coding" platform that anyone can deploy to build their own custom platform. Comes ready with code generation, sandbox environment, and project deployment.  ]]></description>
            <content:encoded><![CDATA[ <p>It’s an exciting time to build applications. With the recent AI-powered <a href="https://www.cloudflare.com/learning/ai/ai-vibe-coding/"><u>"vibe coding"</u></a> boom, anyone can build a website or application by simply describing what they want in a few sentences. We’re already seeing organizations expose this functionality to both their users and internal employees, empowering anyone to build out what they need.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/40Jzjser2hE91b1y3p80pm/7bc1a7f0ee4cfaeb7a39bb413969b189/1.png" />
          </figure><p>Today, we’re excited to open-source an AI vibe coding platform, VibeSDK, to enable anyone to run an entire vibe coding platform themselves, end-to-end, with just one click.</p><p>Want to see it for yourself? Check out our <a href="https://build.cloudflare.dev/"><u>demo platform</u></a> that you can use to create and deploy applications. Or better yet, click the button below to deploy your own AI-powered platform, and dive into the repo to learn about how it’s built.</p><a href="https://deploy.workers.cloudflare.com/?url=https://github.com/cloudflare/vibesdk"><img src="https://deploy.workers.cloudflare.com/button" /></a>
<p></p><p>Deploying VibeSDK sets up everything you need to run your own AI-powered development platform:</p><ul><li><p><b>Integration with LLM models</b> to generate code, build applications, debug errors, and iterate in real-time, powered by <a href="https://developers.cloudflare.com/agents/"><u>Agents SDK</u></a>. </p></li><li><p><b>Isolated development environments </b>that allow users to safely build and preview their applications in secure sandboxes.</p></li><li><p><b>Infinite scale</b> that allows you to deploy thousands or even millions of applications that end users deploy, all served on Cloudflare’s global network</p></li><li><p><b>Observability and caching</b> across multiple AI providers, giving you <a href="https://www.cloudflare.com/learning/performance/what-is-observability/">insight into costs and performance</a> with built-in caching for popular responses. </p></li><li><p><b>Project templates</b> that the LLM can use as a starting point to build common applications and speed up development.</p></li><li><p><b>One-click project export</b> to the user’s Cloudflare account or GitHub repo, so users can take their code and continue development on their own.</p></li></ul>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3jsExecZKmgJHARsxDsMkM/60803b1ba2c68514a053f4d000bf8576/2.png" />
          </figure><p><b>Building an AI vibe coding platform from start to finish</b></p><p><b>Step 0: Get started immediately with VibeSDK</b></p><p>We’re seeing companies build their own AI vibe coding platforms to enable both internal and external users. With a vibe coding platform, internal teams like marketing, product, and support can build their own landing pages, prototypes, or internal tools without having to rely on the engineering team. Similarly, SaaS companies can embed this capability into their product to allow users to build their own customizations. </p><p>Every platform has unique requirements and specializations. By <a href="https://www.cloudflare.com/learning/ai/how-to-get-started-with-vibe-coding/">building your own</a>, you can write custom logic to prompt LLMs for your specific needs, giving your users more relevant results. This also grants you complete control over the development environment and <a href="https://www.cloudflare.com/developer-platform/solutions/hosting/">application hosting</a>, giving you a secure platform that keeps your data private and within your control. </p><p>We wanted to make it easy for anyone to build this themselves, which is why we built a complete platform that comes with project templates, previews, and project deployment. Developers can repurpose the whole platform, or simply take the components they need and customize them to fit their needs.</p><p><b>Step 1: Finding a safe, isolated environment for running untrusted, AI generated code</b></p><p>AI can now build entire applications, but there's a catch: you need somewhere safe to run this untrusted, AI-generated code. Imagine if an <a href="https://www.cloudflare.com/learning/ai/what-is-large-language-model/"><u>LLM</u></a> writes an application that needs to install packages, run build commands, and start a development server — you can't just run this directly on your infrastructure where it might affect other users or systems.</p><p>With <a href="https://developers.cloudflare.com/changelog/2025-06-24-announcing-sandboxes/"><u>Cloudflare Sandboxes</u></a>, you don't have to worry about this. Every user gets their own isolated environment where the AI-generated code can do anything a normal development environment can do: install npm packages, run builds, start servers, but it's fully contained in a secure, container-based environment that can't affect anything outside its sandbox. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1AVtsIotiISgrjspHaRLsg/6cd2e56d5edb7021d63c01183362aafe/3.png" />
          </figure><p>The platform assigns each user to their own sandbox based on their session, so that if a user comes back, they can continue to access the same container with their files intact:</p>
            <pre><code>// Creating a sandbox client for a user session
const sandbox = getSandbox(env.Sandbox, sandboxId);

// Now AI can safely write and execute code in this isolated environment
await sandbox.writeFile('app.js', aiGeneratedCode);
await sandbox.exec('npm install express');
await sandbox.exec('node app.js');</code></pre>
            <p><b>Step 2: Generating the code</b></p><p>Once the sandbox is created, you have a development environment that can bring the code to life. VibeSDK orchestrates the whole workflow from writing the code, installing the necessary packages, and starting the development server. If you ask it to build a to-do app, it will generate the React application, write the component files, run <code>bun install</code> to get the dependencies, and start the server, so you can see the end result. </p><p>Once the user submits their request, the AI will generate all the necessary files, whether it's a React app, Node.js API, or full-stack application, and write them directly to the sandbox:</p>
            <pre><code>async function generateAndWriteCode(instanceId: string) {
    // AI generates the application structure
    const aiGeneratedFiles = await callAIModel("Create a React todo app");
    
    // Write all generated files to the sandbox
    for (const file of aiGeneratedFiles) {
        await sandbox.writeFile(
            `${instanceId}/${file.path}`,
            file.content
        );
        // User sees: "✓ Created src/App.tsx"
        notifyUser(`✓ Created ${file.path}`);
    }
}</code></pre>
            <p>To speed this up even more, we’ve provided a set of templates, stored in an <a href="https://www.cloudflare.com/developer-platform/products/r2/"><u>R2 bucket</u></a>, that the platform can use and quickly customize, instead of generating every file from scratch. This is just an initial set, but you can expand it and add more examples. </p><p><b>Step 3: Getting a preview of your deployment</b></p><p>Once everything is ready, the platform starts the development server and uses the Sandbox SDK to expose it to the internet with a public preview URL which allows users to instantly see their AI-generated application running live:</p>
            <pre><code>// Start the development server in the sandbox
const processId = await sandbox.startProcess(
    `bun run dev`, 
    { cwd: instanceId }
);

// Create a public preview URL 
const preview = await sandbox.exposePort(3000, { 
    hostname: 'preview.example.com' 
});

// User instantly gets: "https://my-app-xyz.preview.example.com"
notifyUser(`✓ Preview ready at: ${preview.url}`);</code></pre>
            <p><b>Step 4: Test, log, fix, repeat</b></p><p>But that’s not all! Throughout this process, the platform will capture console output, build logs, and error messages and feed them back to the LLM for automatic fixes. As the platform makes any updates or fixes, the user can see it all happening live — the file editing, installation progress, and error resolution. </p><p>Deploying applications: From Sandbox to Region Earth</p><p>Once the application is developed, it needs to be deployed. The platform packages everything in the sandbox and then uses a separate specialized "deployment sandbox" to deploy the application to <a href="https://www.cloudflare.com/developer-platform/products/workers/"><u>Cloudflare Workers</u></a>. This deployment sandbox runs <code>wrangler deploy</code> inside the secure environment to publish the application to Cloudflare's global network. </p><p>Since the platform may deploy up to thousands or millions of applications, Workers for Platforms is used to deploy the Workers at scale. Although all the Workers are deployed to the same Namespace, they are all isolated from one another by default, ensuring there’s no cross-tenant access. Once deployed, each application receives its own isolated Worker instance with a unique public URL like <code>my-app.vibe-build.example.com</code>. </p>
            <pre><code>async function deployToWorkersForPlatforms(instanceId: string) {
    // 1. Package the app from development sandbox
    const devSandbox = getSandbox(env.Sandbox, instanceId);
    const packagedApp = await devSandbox.exec('zip -r app.zip .');
    
    // 2. Transfer to specialized deployment sandbox
    const deploymentSandbox = getSandbox(env.DeployerServiceObject, 'deployer');
    await deploymentSandbox.writeFile('app.zip', packagedApp);
    await deploymentSandbox.exec('unzip app.zip');
    
    // 3. Deploy using Workers for Platforms dispatch namespace
    const deployResult = await deploymentSandbox.exec(`
        bunx wrangler deploy \\\\
        --dispatch-namespace vibe-sdk-build-default-namespace
    `);
    
    // Each app gets its own isolated Worker and unique URL
    // e.g., https://my-app.example.com
    return `https://${instanceId}.example.com`;
}</code></pre>
            <p><b>Exportable Applications </b></p><p>The platform also allows users to export their application to their own Cloudflare account and GitHub repo, so they can continue the development on their own. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3itKLSmnTzk2NapoaDQG6e/a0aba26f83cb01db5bd7957a8dfe18f4/Screenshot_2025-09-23_at_9.22.28%C3%A2__AM.png" />
          </figure><p>Observability, caching, and multi-model support built in! </p><p>It's no secret that LLM models have their specialties, which means that when building an AI-powered platform, you may end up using a few different models for different operations. By default, VibeSDK leverages Google’s Gemini models (gemini-2.5-pro, gemini-2.5-flash-lite, gemini-2.5-flash) for project planning, code generation, and debugging. </p><p>VibeSDK is automatically set up with <a href="https://www.cloudflare.com/developer-platform/products/ai-gateway/"><u>AI Gateway</u></a>, so that by default, the platform is able to:</p><ul><li><p>Use a unified access point to <a href="https://blog.cloudflare.com/ai-gateway-aug-2025-refresh/"><u>route requests across LLM providers</u></a>, allowing you to use models from a range of providers (OpenAI, Anthropic, Google, and others)</p></li><li><p>Cache popular responses, so when someone asks to "build a to-do list app", the gateway can serve a cached response instead of going to the provider (saving inference costs)</p></li><li><p>Get observability into the requests, tokens used, and response times across all providers in one place</p></li><li><p>Track costs across models and integrations</p></li></ul><p>Open sourced, so you can build your own Platform! </p><p>We're open-sourcing VibeSDK for the same reason Cloudflare open-sourced the Workers runtime — we believe the best development happens in the open. That's why we wanted to make it as easy as possible for anyone to build their own AI coding platform, whether it's for internal company use, for your website builder, or for the next big vibe coding platform. We tied all the pieces together for you, so you can get started with the click of a button instead of spending months figuring out how to connect everything yourself. To learn more, check out our <a href="https://developers.cloudflare.com/reference-architecture/diagrams/ai/ai-vibe-coding-platform/"><u>reference architecture</u></a> for vibe coding platforms. </p><a href="https://deploy.workers.cloudflare.com/?url=https://github.com/cloudflare/vibesdk"><img src="https://deploy.workers.cloudflare.com/button" /></a>
<p></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2sjy1OUciwTKnKmhXJfbLQ/d0e01bee3867d639077f134fc6374948/5.png" />
          </figure><div>
  
</div><p></p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Containers]]></category>
            <category><![CDATA[Cloudflare for SaaS]]></category>
            <guid isPermaLink="false">6hS4bQv1FRDVwOoB1HrU3u</guid>
            <dc:creator>Ashish Kumar Singh</dc:creator>
            <dc:creator>Abhishek Kankani</dc:creator>
            <dc:creator>Dina Kozlov</dc:creator>
        </item>
        <item>
            <title><![CDATA[AI Week 2025: Recap]]></title>
            <link>https://blog.cloudflare.com/ai-week-2025-wrapup/</link>
            <pubDate>Wed, 03 Sep 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ How do we embrace the power of AI without losing control? That was one of our big themes for AI Week 2025. Check out all of the products, partnerships, and features we announced. ]]></description>
            <content:encoded><![CDATA[ <p></p><p>How do we embrace the power of AI without losing control? </p><p>That was one of our big themes for AI Week 2025, which has now come to a close. We announced products, partnerships, and features to help companies successfully navigate this new era.</p><p>Everything we built was based on feedback from customers like you that want to get the most out of AI without sacrificing control and safety. Over the next year, we will double down on our efforts to deliver world-class features that augment and secure AI. Please keep an eye on our Blog, AI Avenue, Product Change Log and CloudflareTV for more announcements.</p><p>This week we focused on four core areas to help companies secure and deliver AI experiences safely and securely:</p><ul><li><p><b>Securing AI environments and workflows</b></p></li><li><p><b>Protecting original content from misuse by AI</b></p></li><li><p><b>Helping developers build world-class, secure, AI experiences </b></p></li><li><p><b>Making Cloudflare better for you with AI</b></p></li></ul><p>Thank you for following along with our first ever AI week at Cloudflare. This recap blog will summarize each announcement across these four core areas. For more information, check out our “<a href="http://thisweekinnet.com"><u>This Week in NET</u></a>” recap episode also featured at the end of this blog.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1JQHvkcThqyE3f21FjM59I/20e41ab0d3c4aaecbedc6d51b5c1f9f8/BLOG-2933_2.png" />
          </figure>
    <div>
      <h2>Securing AI environments and workflows</h2>
      <a href="#securing-ai-environments-and-workflows">
        
      </a>
    </div>
    <p>These posts and features focused on helping companies control and understand their employee’s usage of AI tools.</p><table><tr><td><p><b>Blog</b></p></td><td><p><b>Recap</b></p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/ai-prompt-protection/">Beyond the ban: A better way to secure generative AI applications</a></p></td><td><p>Generative AI tools present a trade-off of productivity and data risk. Cloudflare One’s new AI prompt protection feature provides the visibility and control needed to govern these tools, allowing organizations to confidently embrace AI.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/shadow-AI-analytics/">Unmasking the Unseen: Your Guide to Taming Shadow AI with Cloudflare One</a></p></td><td><p>Don't let "Shadow AI" silently leak your data to unsanctioned AI. This new threat requires a new defense. Learn how to gain visibility and control without sacrificing innovation.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/confidence-score-rubric/">Introducing Cloudflare Application Confidence Score For AI Applications</a></p></td><td><p>Cloudflare will provide confidence scores within our application library for Gen AI applications, allowing customers to assess their risk for employees using shadow IT.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/casb-ai-integrations/">ChatGPT, Claude, &amp; Gemini security scanning with Cloudflare CASB</a></p></td><td><p>Cloudflare CASB now scans ChatGPT, Claude, and Gemini for misconfigurations, sensitive data exposure, and compliance issues, helping organizations adopt AI with confidence.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/zero-trust-mcp-server-portals/">Securing the AI Revolution: Introducing Cloudflare MCP Server Portals</a></p></td><td><p>Cloudflare MCP Server Portals are now available in Open Beta. MCP Server Portals are a new capability that enable you to centralize, secure, and observe every MCP connection in your organization.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/best-practices-sase-for-ai/">Best Practices for Securing Generative AI with SASE</a></p></td><td><p>This guide provides best practices for Security and IT leaders to securely adopt generative AI using Cloudflare’s SASE architecture as part of a strategy for AI Security Posture Management (AI-SPM).</p></td></tr></table>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3q82P48XrTFDEWKBiIWlVC/d9c1bfa96d7b170df2f66577767d1ecc/BLOG-2933_3.png" />
          </figure>
    <div>
      <h2>Protecting original content from misuse by AI</h2>
      <a href="#protecting-original-content-from-misuse-by-ai">
        
      </a>
    </div>
    <p>Cloudflare is committed to helping content creators control access to their original work. These announcements focused on analysis of what we’re currently seeing on the Internet with respect to AI bots and crawlers and significant improvements to our existing control features.</p><table><tr><td><p><b>Blog</b></p></td><td><p><b>Recap</b></p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/ai-crawler-traffic-by-purpose-and-industry/">A deeper look at AI crawlers: breaking down traffic by purpose and industry</a></p></td><td><p>We are extending AI-related insights on Cloudflare Radar with new industry-focused data and a breakdown of bot traffic by purpose, such as training or user action.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/signed-agents/">The age of agents: cryptographically recognizing agent traffic</a></p></td><td><p>Cloudflare now lets websites and bot creators use Web Bot Auth to segment agents from verified bots, making it easier for customers to allow or disallow the many types of user and partner directed.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/conversational-search-with-nlweb-and-autorag/">Make Your Website Conversational for People and Agents with NLWeb and AutoRAG</a></p></td><td><p>With NLWeb, an open project by Microsoft, and Cloudflare AutoRAG, conversational search is now a one-click setup for your website.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/introducing-ai-crawl-control/">The next step for content creators in working with AI bots: Introducing AI Crawl Control</a></p></td><td><p>Cloudflare launches AI Crawl Control (formerly AI Audit) and introduces easily customizable 402 HTTP responses.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/crawlers-click-ai-bots-training/">The crawl-to-click gap: Cloudflare data on AI bots, training, and referrals</a></p></td><td><p>By mid-2025, training drives nearly 80% of AI crawling, while referrals to publishers (especially from Google) are falling and crawl-to-refer ratios show AI consumes far more than it sends back.</p></td></tr></table>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2XxME3f6wr64laagnl7fMR/d6929874d74637eec7d0227de0c33211/BLOG-2933_4.png" />
          </figure>
    <div>
      <h2>Helping developers build world-class, secure, AI experiences</h2>
      <a href="#helping-developers-build-world-class-secure-ai-experiences">
        
      </a>
    </div>
    <p>At Cloudflare we are committing to building the best platform to build AI experiences, all with security by default.</p><table><tr><td><p><b>Blog</b></p></td><td><p><b>Recap</b></p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/ai-gateway-aug-2025-refresh/">AI Gateway now gives you access to your favorite AI models, dynamic routing and more — through just one endpoint</a></p></td><td><p>AI Gateway now gives you access to your favorite AI models, dynamic routing and more — through just one endpoint.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/cloudflares-most-efficient-ai-inference-engine/">How we built the most efficient inference engine for Cloudflare’s network</a></p></td><td><p>Infire is an LLM inference engine that employs a range of techniques to maximize resource utilization, allowing us to serve AI models more efficiently with better performance for Cloudflare workloads.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/workers-ai-partner-models/">State-of-the-art image generation Leonardo models and text-to-speech Deepgram models now available in Workers AI</a></p></td><td><p>We're expanding Workers AI with new partner models from Leonardo.Ai and Deepgram. Start using state-of-the-art image generation models from Leonardo and real-time TTS and STT models from Deepgram.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/how-cloudflare-runs-more-ai-models-on-fewer-gpus/">How Cloudflare runs more AI models on fewer GPUs: A technical deep-dive</a></p></td><td><p>Cloudflare built an internal platform called Omni. This platform uses lightweight isolation and memory over-commitment to run multiple AI models on a single GPU.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/welcome-to-ai-avenue/">Cloudflare Launching AI Miniseries for Developers (and Everyone Else They Know)</a></p></td><td><p>In AI Avenue, we address people’s fears, show them the art of the possible, and highlight the positive human stories where AI is augmenting — not replacing — what people can do. And yes, we even let people touch AI themselves.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/block-unsafe-llm-prompts-with-firewall-for-ai/">Block unsafe prompts targeting your LLM endpoints with Firewall for AI</a></p></td><td><p>Cloudflare's AI security suite now includes unsafe content moderation, integrated into the Application Security Suite via Firewall for AI.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/cloudflare-realtime-voice-ai/">Cloudflare is the best place to build realtime voice agents</a></p></td><td><p>Today, we're excited to announce new capabilities that make it easier than ever to build real-time, voice-enabled AI applications on Cloudflare's global network.</p></td></tr></table>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/69qL26BPP68czkSiBGVkuM/2e916e61473354bff2806ac0d8a2517a/BLOG-2933_5.png" />
          </figure>
    <div>
      <h2>Making Cloudflare better for you with AI</h2>
      <a href="#making-cloudflare-better-for-you-with-ai">
        
      </a>
    </div>
    <p>Cloudflare logs and analytics can often be a needle in the haystack challenge, AI helps surface and alert to issues that need attention or review. Instead of a human having to spend hours sifting and searching for an issue, they can focus on action and remediation while AI does the sifting.</p><table><tr><td><p><b>Blog</b></p></td><td><p><b>Except</b></p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/background-removal/">Evaluating image segmentation models for background removal for Images</a></p></td><td><p>An inside look at how the Images team compared dichotomous image segmentation models to identify and isolate subjects in an image from the background.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/automating-threat-analysis-and-response-with-cloudy/">Automating threat analysis and response with Cloudy</a></p></td><td><p>Cloudy now supercharges analytics investigations and Cloudforce One threat intelligence! Get instant insights from threat events and APIs on APTs, DDoS, cybercrime &amp; more - powered by Workers AI!</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/cloudy-driven-email-security-summaries/">Cloudy Summarizations of Email Detections: Beta Announcement</a></p></td><td><p>We're now leveraging our internal LLM, Cloudy, to generate automated summaries within our Email Security product, helping SOC teams better understand what's happening within flagged messages.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/AI-troubleshoot-warp-and-network-connectivity-issues/">Troubleshooting network connectivity and performance with Cloudflare AI</a></p></td><td><p>Troubleshoot network connectivity issues by using Cloudflare AI-Power to quickly self diagnose and resolve WARP client and network issues.</p></td></tr></table><p>We thank you for following along this week — and please stay tuned for exciting announcements coming during Cloudflare’s 15th birthday week in September!</p><p>Check out the full video recap, featuring insights from Kenny Johnson and host João Tomé, in our special This Week in NET episode (<a href="https://thisweekinnet.com">ThisWeekinNET.com</a>) covering everything announced during AI Week 2025.</p><div>
  
</div><p></p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[AI Gateway]]></category>
            <category><![CDATA[Generative AI]]></category>
            <category><![CDATA[Workers AI]]></category>
            <category><![CDATA[AI WAF]]></category>
            <category><![CDATA[AI Bots]]></category>
            <guid isPermaLink="false">6l0AjZFdEn4hrKgQlWOYiB</guid>
            <dc:creator>Kenny Johnson</dc:creator>
            <dc:creator>James Allworth</dc:creator>
        </item>
        <item>
            <title><![CDATA[Automating threat analysis and response with Cloudy ]]></title>
            <link>https://blog.cloudflare.com/automating-threat-analysis-and-response-with-cloudy/</link>
            <pubDate>Fri, 29 Aug 2025 14:05:00 GMT</pubDate>
            <description><![CDATA[ Cloudy now supercharges analytics investigations and Cloudforce One threat intelligence! Get instant insights from threat events and APIs on APTs, DDoS, cybercrime & more - powered by Workers AI. ]]></description>
            <content:encoded><![CDATA[ <p>Security professionals everywhere face a paradox: while more data provides the visibility needed to catch threats, it also makes it harder for humans to process it all and find what's important. When there’s a sudden spike in suspicious traffic, every second counts. But for many security teams — especially lean ones — it’s hard to quickly figure out what’s going on. Finding a root cause means diving into dashboards, filtering logs, and cross-referencing threat feeds. All the data tracking that has happened can be the very thing that slows you down — or worse yet, what buries the threat that you’re looking for. </p><p>Today, we’re excited to announce that we’ve solved that problem. We’ve integrated <a href="https://blog.cloudflare.com/introducing-ai-agent/"><u>Cloudy</u></a> — Cloudflare’s first <a href="https://www.cloudflare.com/learning/ai/what-is-agentic-ai/"><u>AI agent</u></a> — with our security analytics functionality, and we’ve also built a new, conversational interface that Cloudflare users can use to ask questions, refine investigations, and get answers.  With these changes, Cloudy can now help Cloudflare users find the needle in the digital haystack, making security analysis faster and more accessible than ever before.  </p><p>Since Cloudy’s launch in March of this year, its adoption has been exciting to watch. Over <b>54,000</b> users have tried Cloudy for <a href="https://developers.cloudflare.com/waf/custom-rules/"><u>custom rule</u></a> creation, and <b>31%</b> of them have deployed a rule suggested by the agent. For our log explainers in <a href="https://www.cloudflare.com/zero-trust/products/gateway/"><u>Cloudflare Gateway</u></a>, Cloudy has been loaded over <b>30,000 </b> times in just the last month, with <b>80%</b> of the feedback we received confirming the summaries were insightful. We are excited to empower our users to do even more.</p>
    <div>
      <h2>Talk to your traffic: a new conversational interface for faster RCA and mitigation</h2>
      <a href="#talk-to-your-traffic-a-new-conversational-interface-for-faster-rca-and-mitigation">
        
      </a>
    </div>
    <p>Security analytics dashboards are powerful, but they often require you to know exactly what you're looking for — and the right queries to get there. The new Cloudy chat interface changes this. It is designed for faster root cause analysis (RCA) of traffic anomalies, helping you get from “something’s wrong” to “here’s the fix” in minutes. You can now start with a broad question and narrow it down, just like you would with a human analyst.</p><p>For example, you can start an investigation by asking Cloudy to look into a recommendation from Security Analytics.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1P7YDzX9JoHmmKLPwGw0z8/aa3675b36492ea13e2cba4d1ba13dce4/image4.png" />
          </figure>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6Nort6ZEZUUkYQc8PTiLgo/33a92121c4c161290f50e792d77c1e16/image1.png" />
          </figure><p>From there, you can ask follow-up questions to dig deeper:</p><ul><li><p>"Focus on login endpoints only."</p></li><li><p>"What are the top 5 IP addresses involved?"</p></li><li><p>"Are any of these IPs known to be malicious?"</p></li></ul><p>This is just the beginning of how Cloudy is transforming security. You can <a href="http://blog.cloudflare.com/cloudy-driven-email-security-summaries/"><u>read more</u></a> about how we’re using Cloudy to bring clarity to another critical security challenge: automating summaries of email detections. This is the same core mission — translating complex security data into clear, actionable insights — but applied to the constant stream of email threats that security teams face every day.</p>
    <div>
      <h2>Use Cloudy to understand, prioritize, and act on threats</h2>
      <a href="#use-cloudy-to-understand-prioritize-and-act-on-threats">
        
      </a>
    </div>
    <p>Analyzing your own logs is powerful — but it only shows part of the picture. What if Cloudy could look beyond your own data and into Cloudflare’s global network to identify emerging threats? This is where Cloudforce One's <a href="https://blog.cloudflare.com/threat-events-platform/"><u>Threat Events platform</u></a> comes in.</p><p>Cloudforce One translates the high-volume attack data observed on the Cloudflare network into real-time, attacker-attributed events relevant to your organization. This platform helps you track adversary activity at scale — including APT infrastructure, cybercrime groups, compromised devices, and volumetric DDoS activity. Threat events provide detailed, context-rich events, including interactive timelines and mappings to attacker TTPs, regions, and targeted verticals. </p><p>We have spent the last few months making Cloudy more powerful by integrating it with the Cloudforce One Threat Events platform.  Cloudy now can offer contextual data about the threats we observe and mitigate across Cloudflare's global network, spanning everything from APT activity and residential proxies to ACH fraud, DDoS attacks, WAF exploits, cybercrime, and compromised devices. This integration empowers our users to quickly understand, prioritize, and act on <a href="https://www.cloudflare.com/learning/security/what-are-indicators-of-compromise/"><u>indicators of compromise (IOCs)</u></a> based on a vast ocean of real-time threat data. </p><p>Cloudy lets you query this global dataset in a natural language and receive clear, concise answers. For example, imagine asking these questions and getting immediate actionable answers:</p><ul><li><p>Who is targeting my industry vertical or country?</p></li><li><p>What are the most relevant indicators (IPs, JA3/4 hashes, ASNs, domains, URLs, SHA fingerprints) to block right now?</p></li><li><p>How has a specific adversary progressed across the cyber kill chain over time?</p></li><li><p>What novel new threats are threat actors using that might be used against your network next, and what insights do Cloudflare analysts know about them?</p></li></ul><p>Simply interact with Cloudy in the Cloudflare Dashboard &gt; Security Center &gt; Threat Intelligence, providing your queries in natural language. It can walk you from a single indicator (like an IP address or domain) to the specific threat event Cloudflare observed, and then pivot to other related data — other attacks, related threats, or even other activity from the same actor. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4WE42KXmWzejXpk8CsG05h/2fe63d5f86fe78642a341d645844ab56/image2.png" />
          </figure><p>This cuts through the noise, so you can quickly understand an adversary's actions across the cyber kill chain and MITRE ATT&amp;CK framework, and then block attacks with precise, actionable intelligence. The threat events platform is like an evidence board on the wall that helps you understand threats; Cloudy is like your sidekick that will run down every lead.</p>
    <div>
      <h2>How it works: Agents SDK and Workers AI</h2>
      <a href="#how-it-works-agents-sdk-and-workers-ai">
        
      </a>
    </div>
    <p>Developing this advanced capability for Cloudy was a testament to the agility of Cloudflare's AI ecosystem. We leveraged our <a href="https://developers.cloudflare.com/agents/"><u>Agents SDK</u></a> running on <a href="https://developers.cloudflare.com/workers-ai/"><u>Workers AI</u></a>. This allowed for rapid iteration and deployment, ensuring Cloudy could quickly grasp the nuances of threat intelligence and provide highly accurate, contextualized insights. The combination of our massive network telemetry, purpose-built LLM prompts, and the flexibility of Workers AI means Cloudy is not just fast, but also remarkably precise.</p><p>And a quick word on what we didn’t do when developing Cloudy: We did not train Cloudy on any Cloudflare customer data. Instead, Cloudy relies on models made publicly available through <a href="https://developers.cloudflare.com/workers-ai/models/"><u>Workers AI</u></a>. For more information on Cloudflare’s approach to responsible AI, please see <a href="https://www.cloudflare.com/trust-hub/responsible-ai/"><u>these FAQs</u></a>.</p>
    <div>
      <h2>What's next for Cloudy</h2>
      <a href="#whats-next-for-cloudy">
        
      </a>
    </div>
    <p>This is just the next step in Cloudy’s journey. We're working on expanding Cloudy's abilities across the board. This includes intelligent debugging for WAF rules and deeper integrations with Alerts to give you more actionable, contextual notifications. At the same time, we are continuously enriching our threat events datasets and exploring ways for Cloudy to help you visualize complex attacker timelines, campaign overviews, and intricate attack graphs. Our goal remains the same: make Cloudy an indispensable partner in understanding and reacting to the security landscape.</p><p>The new chat interface is now available on all plans, and the threat intelligence capabilities are live for Cloudforce One customers. Learn more about Cloudforce One <a href="https://www.cloudflare.com/application-services/products/cloudforceone/"><u>here</u></a> and reach out for a <a href="https://www.cloudflare.com/plans/enterprise/contact/?utm_medium=referral&amp;utm_source=blog&amp;utm_campaign=2025-q3-acq-gbl-connectivity-ge-ge-general-ai_week_blog"><u>consultation</u></a> if you want to go deeper with our experts.</p><div>
  
</div><p></p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[Cloudy]]></category>
            <category><![CDATA[Cloudforce One]]></category>
            <category><![CDATA[Threat Intelligence]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Workers AI]]></category>
            <guid isPermaLink="false">26RGd07uODP8AQ5WaxcjnF</guid>
            <dc:creator>Alexandra Moraru</dc:creator>
            <dc:creator>Harsh Saxena</dc:creator>
            <dc:creator>Steve James</dc:creator>
            <dc:creator>Nick Downie</dc:creator>
            <dc:creator>Levi Kipke</dc:creator>
        </item>
        <item>
            <title><![CDATA[Cloudy Summarizations of Email Detections: Beta Announcement]]></title>
            <link>https://blog.cloudflare.com/cloudy-driven-email-security-summaries/</link>
            <pubDate>Fri, 29 Aug 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ We're now leveraging our internal LLM, Cloudy, to generate automated summaries within our Email Security product, helping SOC teams better understand what's happening within flagged messages. ]]></description>
            <content:encoded><![CDATA[ 
    <div>
      <h2>Background</h2>
      <a href="#background">
        
      </a>
    </div>
    <p>Organizations face continuous threats from <a href="https://www.cloudflare.com/learning/access-management/phishing-attack/"><u>phishing</u></a>,<a href="https://www.cloudflare.com/learning/email-security/business-email-compromise-bec/"><u> business email compromise (BEC)</u></a>, and other advanced email attacks. Attackers <a href="https://www.cloudflare.com/the-net/multichannel-phishing/"><u>adapt their tactics</u></a> daily, forcing defenders to move just as quickly to keep inboxes safe.</p><p>Cloudflare’s visibility across a large portion of the Internet gives us an unparalleled view of malicious campaigns. We process billions of email threat signals every day, feeding them into multiple AI and machine learning models. This lets our detection team create and deploy new rules at high speed, blocking malicious and unwanted emails before they reach the inbox.</p><p>But rapid protection introduces a new challenge: making sure security teams understand exactly what we blocked — and why.</p>
    <div>
      <h2>The Challenge</h2>
      <a href="#the-challenge">
        
      </a>
    </div>
    <p>Cloudflare’s fast-moving detection pipeline is one of our greatest strengths — but it also creates a communication gap for customers. Every day, our detection analysts publish new rules to block phishing, BEC, and other unwanted messages. These rules often blend signals from multiple AI and machine learning models, each looking at different aspects of a message like its content, headers, links, attachments, and sender reputation.</p><p>While this layered approach catches threats early, SOC teams don’t always have insight into the specific combination of factors that triggered a detection. Instead, they see a rule name in the investigation tab with little explanation of what it means.</p><p>Take the rule <i>BEC.SentimentCM_BEC.SpoofedSender</i> as an example. Internally, we know this indicates:</p><ul><li><p>The email contained no unique links or attachments a common BEC pattern</p></li><li><p>It was flagged as highly likely to be BEC by our Churchmouse sentiment analysis models</p></li><li><p>Spoofing indicators were found, such as anomalies in the envelope_from header</p></li></ul><p>Those details are second nature to our detection team, but without that context, SOC analysts are left to reverse-engineer the logic from opaque labels. They don’t see the nuanced ML outputs (like Churchmouse’s sentiment scoring) or the subtle header anomalies, or the sender IP/domain reputation data that factored into the decision.</p><p>The result is time lost to unclear investigations or the risk of mistakenly releasing malicious emails. For teams operating under pressure, that’s more than just an inconvenience, it's a security liability.</p><p>That’s why we extended Cloudy (our AI-powered agent) to translate complex detection logic into clear explanations, giving SOC teams the context they need without slowing them down.</p>
    <div>
      <h2>Enter Cloudy Summaries</h2>
      <a href="#enter-cloudy-summaries">
        
      </a>
    </div>
    <p>Several weeks ago, we launched Cloudy within our Cloudflare One product suite to help customers understand gateway policies and their impacts (you can read more about the launch here: https://blog.cloudflare.com/introducing-ai-agent/).</p><p>We began testing Cloudy's ability to explain the detections and updates we continuously deploy. Our first attempt revealed significant challenges.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/63bsCRl2hKUyECh1vJND5k/a033fce3c95a635ede07e1fd03a9edf5/image3.png" />
          </figure>
    <div>
      <h3>The Hallucination Problem</h3>
      <a href="#the-hallucination-problem">
        
      </a>
    </div>
    <p>We observed frequent LLM <a href="https://www.cloudflare.com/learning/ai/what-are-ai-hallucinations/"><u>hallucinations</u></a>, the model generating inaccurate information about messages. While this might be acceptable when analyzing logs, it's dangerous for email security detections. A hallucination claiming a malicious message is clean could lead SOC analysts to release it from quarantine, potentially causing a security breach.</p><p>These hallucinations occurred because email detections involve numerous and complex inputs. Our scanning process runs messages through multiple ML algorithms examining different components: body content, attachments, links, IP reputation, and more. The same complexity that makes manual detection explanation difficult also caused our initial LLM implementation to produce inconsistent and sometimes inaccurate outputs.</p>
    <div>
      <h3>Building Guardrails</h3>
      <a href="#building-guardrails">
        
      </a>
    </div>
    <p>To minimize hallucination risk while maintaining inbox security, we implemented several manual safeguards:</p><p><b>Step 1: RAG Implementation</b></p><p>We ensured Cloudy only accessed information from our detection dataset corpus, creating a <a href="https://www.cloudflare.com/learning/ai/retrieval-augmented-generation-rag/"><u>Retrieval-Augmented Generation (RAG)</u></a> system. This significantly reduced hallucinations by grounding the LLM's assessments in actual detection data.</p><p><b>Step 2: Model Context Enhancement</b></p><p>We added crucial context about our internal models. For example, the "Churchmouse" designation refers to a group of sentiment detection models, not a single algorithm. Without this context, Cloudy attempted to define "churchmouse" using the common idiom "poor as a church mouse" referencing starving church mice because holy bread never falls to the floor. While historically interesting, this was completely irrelevant to our security context.</p>
    <div>
      <h3>Current Results</h3>
      <a href="#current-results">
        
      </a>
    </div>
    <p>Our testing shows Cloudy now produces more stable explanations with minimal hallucinations. For example, the detection <i>SPAM.ASNReputation.IPReputation_Scuttle.Anomalous_HC</i> now generates this summary:</p><p>"This rule flags email messages as spam if they come from a sender with poor Internet reputation, have been identified as suspicious by a blocklist, and have unusual email server setup, indicating potential malicious activity."</p><p>This strikes the right balance. Customers can quickly understand what the detection found and why we classified the message accordingly.</p>
    <div>
      <h2>Beta Program</h2>
      <a href="#beta-program">
        
      </a>
    </div>
    <p>We're opening Cloudy email detection summaries to a select group of beta users. Our primary goal is ensuring our guardrails prevent hallucinations that could lead to security compromises. During this beta phase, we'll rigorously test outputs and verify their quality before expanding access to all customers.</p>
    <div>
      <h2>Ready to enhance your email security?</h2>
      <a href="#ready-to-enhance-your-email-security">
        
      </a>
    </div>
    <p>We provide all organizations (whether a Cloudflare customer or not) with free access to our Retro Scan tool, allowing them to use our predictive AI models to scan existing inbox messages. Retro Scan will detect and highlight any threats found, enabling organizations to remediate them directly in their email accounts. With these insights, organizations can implement further controls, either using <a href="https://www.cloudflare.com/zero-trust/products/email-security/"><u>Cloudflare Email Security</u></a> or their preferred solution, to prevent similar threats from reaching their inboxes in the future.</p><p>If you are interested in how Cloudflare can help secure your inboxes, sign up for a phishing risk assessment <a href="https://www.cloudflare.com/lp/email-security-self-guided-demo-request/?utm_medium=referral&amp;utm_source=blog&amp;utm_campaign=2025-q3-acq-gbl-modernsec-es-ge-general-ai_week_blog"><u>here</u></a>. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/lV6mxQTYwaS6j0n0e8arE/fd62cf8032b15780690f4ed48578d3fc/image2.png" />
          </figure><div>
  
</div><p></p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[Cloud Email Security]]></category>
            <category><![CDATA[LLM]]></category>
            <guid isPermaLink="false">hzXLKdI5wqNlvwd0JKzXS</guid>
            <dc:creator>Ayush Kumar</dc:creator>
            <dc:creator>Nick Blazier</dc:creator>
            <dc:creator>Phil Syme</dc:creator>
        </item>
        <item>
            <title><![CDATA[Cloudflare is the best place to build realtime voice agents]]></title>
            <link>https://blog.cloudflare.com/cloudflare-realtime-voice-ai/</link>
            <pubDate>Fri, 29 Aug 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Today, we're excited to announce new capabilities that make it easier than ever to build real-time, voice-enabled AI applications on Cloudflare's global network. ]]></description>
            <content:encoded><![CDATA[ <p>The way we interact with AI is fundamentally changing. While text-based interfaces like ChatGPT have shown us what's possible, in terms of interaction, it’s only the beginning. Humans communicate not only by texting, but also talking — we show things, we interrupt and clarify in real-time. Voice AI brings these natural interaction patterns to our applications.</p><p>Today, we're excited to announce new capabilities that make it easier than ever to build real-time, voice-enabled AI applications on Cloudflare's global network. These new features create a complete platform for developers building the next generation of conversational AI experiences or can function as building blocks for more advanced AI agents running across platforms.</p><p>We're launching:</p><ul><li><p><b>Cloudflare Realtime Agents</b> - A runtime for orchestrating voice AI pipelines at the edge</p></li><li><p><b>Pipe raw WebRTC audio as PCM in Workers</b> - You can now connect WebRTC audio directly to your AI models or existing complex media pipelines already built on </p></li><li><p><b>Workers AI WebSocket support</b> - Realtime AI inference with models like PipeCat's smart-turn-v2</p></li><li><p><b>Deepgram on Workers AI</b> - Speech-to-text and text-to-speech running in over 330 cities worldwide</p></li></ul>
    <div>
      <h2>Why realtime AI matters now</h2>
      <a href="#why-realtime-ai-matters-now">
        
      </a>
    </div>
    <p>Today, building voice AI applications is hard. You need to coordinate multiple services such as speech-to-text, language models, text-to-speech while managing complex audio pipelines, handling interruptions, and keeping latency low enough for natural conversation. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/a2D2mbrkDnb0tRo5466DN/8a4643e52a5f23b6948f1d15671140ac/image4.jpg" />
          </figure><p>Building production voice AI requires orchestrating a complex symphony of technologies. You need low latency speech recognition, intelligent language models that understand context and can handle interruptions, natural-sounding voice synthesis, and all of this needs to happen in under 800 milliseconds — the threshold where conversation feels natural rather than stilted. This latency budget is unforgiving. Every millisecond counts: 40ms for microphone input, 300ms for transcription, 400ms for LLM inference, 150ms for text-to-speech. Any additional latency from poor infrastructure choices or distant servers transforms a delightful experience into a frustrating one.</p><p>That's why we're building real-time AI tools: we want to make real-time voice AI as easy to deploy as a static website. We're also witnessing a critical inflection point where conversational AI moves from experimental demos to production-ready systems that can scale globally. If you’re already a developer in the real-time AI ecosystem, we want to build the best building blocks for you to get the lowest latency by leveraging the 330+ datacenters Cloudflare has built.</p>
    <div>
      <h2>Introducing Cloudflare Realtime Agents</h2>
      <a href="#introducing-cloudflare-realtime-agents">
        
      </a>
    </div>
    <p>Cloudflare Realtime Agents is a simple runtime for orchestrating voice AI pipelines that run on our global network, as close to your users as possible. Instead of managing complex infrastructure yourself, you can focus on building great conversational experiences.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1QcKOdouzGYP8DecqqSzM8/022a33e9b7bcbcbd0461fa83df39b1ba/image1.png" />
          </figure>
    <div>
      <h3>How it works</h3>
      <a href="#how-it-works">
        
      </a>
    </div>
    <p>When a user connects to your voice AI application, here's what happens:</p><ol><li><p><b>WebRTC connection</b> - Audio streams from the user's device is sent to the nearest Cloudflare location via WebRTC, using Cloudflare RealtimeKit mobile or web SDKs</p></li><li><p><b>AI pipeline orchestration</b> - Your pre-configured pipeline runs: speech-to-text → LLM → text-to-speech, with support for interruption detection and turn-taking</p></li><li><p><b>Your configured runtime options/callbacks/tools run</b></p></li><li><p><b>Response delivery</b> - Generated audio streams back to the user with minimal latency</p></li></ol><p>The magic is in how we've designed this as composable building blocks. You're not locked into a rigid pipeline — you can configure data flows, add tee and join operations, and control exactly how your AI agent behaves.</p><p>Take a look at the <code>MyTextHandler</code> function from the above diagram, for example. It’s just a function that takes in text and returns text back, inserted after speech-to-text and before text-to-speech:</p>
            <pre><code>class MyTextHandler extends TextComponent {
	env: Env;

	constructor(env: Env) {
		super();
		this.env = env;
	}

	async onTranscript(text: string) {
		const { response } = await this.env.AI.run('@cf/meta/llama-3.1-8b-instruct', {
			prompt: "You are a wikipedia bot, answer the user query:" + text,
		});
		this.speak(response!);
	}
}</code></pre>
            <p>Your agent is a JavaScript class that extends RealtimeAgent, where you initialize a pipeline consisting of the various text-to-speech, speech-to-text, text-to-text and even speech-to-speech transformations.</p>
            <pre><code>export class MyAgent extends RealtimeAgent&lt;Env&gt; {
	constructor(ctx: DurableObjectState, env: Env) {
		super(ctx, env);
	}

	async init(agentId: string ,meetingId: string, authToken: string, workerUrl: string, accountId: string, apiToken: string) {
		// Construct your text processor for generating responses to text
		const textHandler = new MyTextHandler(this.env);
		// Construct a Meeting object to join the RTK meeting
		const transport = new RealtimeKitTransport(meetingId, authToken, [
			{
				media_kind: 'audio',
				stream_kind: 'microphone',
			},
		]);
		const { meeting } = transport;

		// Construct a pipeline to take in meeting audio, transcribe it using
		// Deepgram, and pass our generated responses through ElevenLabs to
		// be spoken in the meeting
		await this.initPipeline(
			[transport, new DeepgramSTT(this.env.DEEPGRAM_API_KEY), textHandler, new ElevenLabsTTS(this.env.ELEVENLABS_API_KEY), transport],
			agentId,
			workerUrl,
			accountId,
			apiToken,
		);

		// The RTK meeting object is accessible to us, so we can register handlers
		// on various events like participant joins/leaves, chat, etc.
		// This is optional
		meeting.participants.joined.on('participantJoined', (participant) =&gt; {
			textHandler.speak(`Participant Joined ${participant.name}`);
		});
		meeting.participants.joined.on('participantLeft', (participant) =&gt; {
			textHandler.speak(`Participant Left ${participant.name}`);
		});

		// Make sure to actually join the meeting after registering all handlers
		await meeting.rtkMeeting.join();
	}

	async deinit() {
		// Add any other cleanup logic required
		await this.deinitPipeline();
	}
}</code></pre>
            <p>View a full example in the <a href="https://developers.cloudflare.com/realtime/agents/getting-started/"><u>developer docs</u></a> and get your own Realtime Agent running. View <a href="https://dash.cloudflare.com/?to=/:account/realtime/agents"><u>Realtime Agents</u></a> on your dashboard.</p>
    <div>
      <h3>Built for flexibility</h3>
      <a href="#built-for-flexibility">
        
      </a>
    </div>
    <p>What makes Realtime Agents powerful is its flexibility:</p><ul><li><p><b>Many AI provider options</b> - Use the models on Workers AI, OpenAI, Anthropic, or any provider through AI Gateway</p></li><li><p><b>Multiple input/output modes</b> - Accept audio and/or text and respond with audio and/or text</p></li><li><p><b>Stateful coordination</b> - Maintain context across the conversation without managing complex state yourself</p></li><li><p><b>Speed and flexibility</b> - use <a href="https://realtime.cloudflare.com"><u>RealtimeKit</u></a> to manage WebRTC sessions and UI for faster development, or for full control over your stack, you can also connect directly using any standard WebRTC client or raw WebSockets</p></li><li><p><b>Integrate</b> with the <a href="https://developers.cloudflare.com/agents/"><u>Cloudflare Agents SDK</u></a></p></li></ul><p>During the open beta starting today, Cloudflare Realtime Agents runtime is free to use and works with various AI models:</p><ul><li><p>Speech and Audio: Integration with platforms like ElevenLabs and Deepgram.</p></li><li><p>LLM Inference: Flexible options to use large language models through Cloudflare Workers AI and AI Gateway, connect to third-party models like OpenAi, Gemini, Grok, Claude, or bring your own custom models.</p></li></ul>
    <div>
      <h2>Pipe raw WebRTC audio as PCM in Workers</h2>
      <a href="#pipe-raw-webrtc-audio-as-pcm-in-workers">
        
      </a>
    </div>
    <p>For developers who need the most flexibility with their applications beyond Realtime Agents, we're exposing the raw WebRTC audio pipeline directly to Workers. </p><p>WebRTC audio in Workers works by leveraging Cloudflare’s Realtime SFU, which converts WebRTC audio in Opus codec to PCM and streams it to any WebSocket endpoint you specify. This means you can use Workers to implement:</p><ul><li><p><b>Live transcription</b> - Stream audio from a video call directly to a transcription service</p></li><li><p><b>Custom AI pipelines</b> - Send audio to AI models without setting up complex infrastructure</p></li><li><p><b>Recording and processing</b> - Save, audit, or analyze audio streams in real-time</p></li></ul>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2b15xG5EfUiNYLtH8cNRTh/116f1e195cada59a61874c74ee499159/image2.png" />
          </figure>
    <div>
      <h3>WebSockets vs WebRTC for voice AI</h3>
      <a href="#websockets-vs-webrtc-for-voice-ai">
        
      </a>
    </div>
    <p>WebSockets and WebRTC can handle audio for AI services, but they work best in different situations. WebSockets are perfect for server-to-server communication and work fine when you don't need super-fast responses, making them great for testing and experimenting. However, if you're building an app where users need real-time conversations with low delay, WebRTC is the better choice.</p><p>WebRTC has several advantages that make it superior for live audio streaming. It uses UDP instead of TCP, which prevents audio delays caused by lost packets holding up the entire stream (<a href="https://blog.cloudflare.com/the-road-to-quic/#head-of-line-blocking"><u>head of line blocking</u></a> is a common topic discussed on this blog). The Opus audio codec in WebRTC automatically adjusts to network conditions and can handle packet loss gracefully. WebRTC also includes built-in features like echo cancellation and noise reduction that WebSockets would require you to build separately. </p><p>With this feature, you can use WebRTC for client to server communication and leveraging Cloudflare to convert to familiar WebSockets for server-to-server communication and backend processing.</p>
    <div>
      <h3>The power of Workers + WebRTC</h3>
      <a href="#the-power-of-workers-webrtc">
        
      </a>
    </div>
    <p>When WebRTC audio gets converted to WebSockets, you get PCM audio at the original sample rate, and from there, you can run any task in and out of the Cloudflare developer platform:</p><ul><li><p>Resample audio and send to different AI providers</p></li><li><p>Run WebAssembly-based audio processing</p></li><li><p>Build complex applications with <a href="https://developers.cloudflare.com/durable-objects/"><u>Durable Objects</u></a>, <a href="https://developers.cloudflare.com/durable-objects/api/alarms/"><u>Alarms</u></a> and other Workers primitives</p></li><li><p>Deploy containerized processing pipelines with <a href="https://developers.cloudflare.com/containers/"><u>Workers Containers</u></a></p></li></ul><p>The WebSocket works bidirectionally, so data sent back on the WebSocket becomes available as a WebRTC track on the Realtime SFU, ready to be consumed within WebRTC.</p><p>To illustrate this setup, we’ve made a simple <a href="https://github.com/cloudflare/realtime-examples/tree/main/tts-ws"><u>WebRTC application demo</u></a> that uses the ElevenLabs API for  text-to-speech.</p><p>Visit the <a href="https://developers.cloudflare.com/realtime/sfu/"><u>Realtime SFU developer docs</u></a> on how to get started.</p>
    <div>
      <h2>Realtime AI inference with WebSockets</h2>
      <a href="#realtime-ai-inference-with-websockets">
        
      </a>
    </div>
    <p>WebSockets provide the backbone of real-time AI pipelines because it is a low-latency, bidirectional primitive with ubiquitous support in developer tooling, especially for server to server communication. Although HTTP works great for many use cases like chat or batch inference, real-time voice AI needs persistent, low-latency connections when talking to AI inference servers. To support your real-time AI workloads, Workers AI now supports WebSocket connections in select models.</p>
    <div>
      <h3>Launching with PipeCat SmartTurn V2</h3>
      <a href="#launching-with-pipecat-smartturn-v2">
        
      </a>
    </div>
    <p>The first model with WebSocket support is PipeCat's <a href="https://developers.cloudflare.com/workers-ai/models/smart-turn-v2/"><u>smart-turn-v2</u></a> turn detection model — a critical component for natural conversation. Turn detection models determine when a speaker has finished talking and it's appropriate for the AI to respond. Getting this right is the difference between an AI that constantly interrupts and one that feels natural to talk to.</p><p>Below is an example on how to call smart-turn-v2 running on Workers AI.</p>
            <pre><code>"""
Cloudflare AI WebSocket Inference - With PipeCat's smart-turn-v2
"""

import asyncio
import websockets
import json
import numpy as np

# Configuration
ACCOUNT_ID = "your-account-id"
API_TOKEN = "your-api-token"
MODEL = "@cf/pipecat-ai/smart-turn-v2"

# WebSocket endpoint
WEBSOCKET_URL = f"wss://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/{MODEL}?dtype=uint8"

async def run_inference(audio_data: bytes) -&gt; dict:
    async with websockets.connect(
        WEBSOCKET_URL,
        additional_headers={
            "Authorization": f"Bearer {API_TOKEN}"
        }
    ) as websocket:
        await websocket.send(audio_data)
        
        response = await websocket.recv()
        result = json.loads(response)
        
        # Response format: {'is_complete': True, 'probability': 0.87}
        return result

def generate_test_audio():    
    noise = np.random.normal(128, 20, 8192).astype(np.uint8)
    noise = np.clip(noise, 0, 255) 
    
    return noise

async def demonstrate_inference():
    # Generate test audio
    noise = generate_test_audio()
    
    try:
        print("\nTesting noise...")
        noise_result = await run_inference(noise.tobytes())
        print(f"Noise result: {noise_result}")
        
    except Exception as e:
        print(f"Error: {e}")

if __name__ == "__main__":
    asyncio.run(demonstrate_inference())</code></pre>
            
    <div>
      <h2>Deepgram in Workers AI</h2>
      <a href="#deepgram-in-workers-ai">
        
      </a>
    </div>
    <p>On Wednesday, we announced that Deepgram's speech-to-text and text-to-speech models are available on Workers AI, running in Cloudflare locations worldwide. This means:</p><ul><li><p><b>Lower latency</b> - Speech recognition happens at the edge, close to users running in the same network as Workers</p></li><li><p><b>WebRTC audio processing</b> without leaving the Cloudflare network</p></li><li><p><b>State-of-the-art audio ML models</b> powerful, capable, and fast audio models, available directly through Workers AI</p></li><li><p><b>Global scale</b> - leverages Cloudflare’s global network in 330+ cities automatically</p></li></ul><p>Deepgram is a popular choice for voice AI applications. By building your voice AI systems on the Cloudflare platform, you get access to powerful models and the lowest latency infrastructure to give your application a natural, responsive experience.</p>
    <div>
      <h3>Interested in other realtime AI models running on Cloudflare?</h3>
      <a href="#interested-in-other-realtime-ai-models-running-on-cloudflare">
        
      </a>
    </div>
    <p>If you're developing AI models for real-time applications, we want to run them on Cloudflare's network. Whether you have proprietary models or need ultra-low latency inference at scale with open source models reach out to us.</p>
    <div>
      <h2>Get started today</h2>
      <a href="#get-started-today">
        
      </a>
    </div>
    <p>All of these features are available now:</p><ul><li><p><b>Cloudflare Realtime Agents</b> - <a href="https://developers.cloudflare.com/realtime/agents/getting-started/"><u>Start testing in beta</u></a></p></li><li><p><b>WebRTC audio as PCM in Workers</b> - <a href="https://developers.cloudflare.com/realtime/sfu/"><u>Read the documentation</u></a> and integrate with your applications</p></li><li><p><b>Workers AI WebSocket support</b> - Try out PipeCat’s <a href="https://developers.cloudflare.com/workers-ai/models/smart-turn-v2/"><u>smart-turn-v2</u></a> model</p></li><li><p><a href="https://blog.cloudflare.com/workers-ai-partner-models/"><b><u>Deepgram on Workers AI</u></b></a> - Available now at <a href="https://developers.cloudflare.com/workers-ai/models/aura-1/"><u>@cf/deepgram/aura-1</u></a> and <a href="https://developers.cloudflare.com/workers-ai/models/nova-3/"><u>@cf/deepgram/nova-3 </u></a></p></li></ul><p>Want to pick the brains of the engineers who built this? Join them for technical deep dives, live demos Q&amp;A at Cloudflare Connect in Las Vegas. Explore the <a href="https://events.cloudflare.com/connect/2025/"><u>full schedule and register</u></a>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6wpPvADZYXKpbuqXcJWGfn/0c93500141d1f8dd443c04e5e3d69155/image3.png" />
          </figure><p>
</p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[AI]]></category>
            <guid isPermaLink="false">4AaIT3iiPV1cfuh2FxoUgq</guid>
            <dc:creator>Renan Dincer</dc:creator>
        </item>
        <item>
            <title><![CDATA[Troubleshooting network connectivity and performance with Cloudflare AI]]></title>
            <link>https://blog.cloudflare.com/AI-troubleshoot-warp-and-network-connectivity-issues/</link>
            <pubDate>Fri, 29 Aug 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Troubleshoot network connectivity issues by using Cloudflare AI-Power to quickly self diagnose and resolve WARP client and network issues. ]]></description>
            <content:encoded><![CDATA[ <p>Monitoring a corporate network and troubleshooting any performance issues across that network is a hard problem, and it has become increasingly complex over time. Imagine that you’re maintaining a corporate network, and you get the dreaded IT ticket. An executive is having a performance issue with an application, and they want you to look into it. The ticket doesn’t have a lot of details. It simply says: “Our internal documentation is taking forever to load. PLS FIX NOW”.</p><p>In the early days of IT, a corporate network was built on-premises. It provided network connectivity between employees that worked in person and a variety of corporate applications that were hosted locally.</p><p>The shift to cloud environments, the rise of SaaS applications, and a “work from anywhere” model has made IT environments significantly more complex in the past few years. Today, it’s hard to know if a performance issue is the result of:</p><ul><li><p>An employee’s device</p></li><li><p>Their home or corporate wifi</p></li><li><p>The corporate network</p></li><li><p>A cloud network hosting a SaaS app</p></li><li><p>An intermediary ISP</p></li></ul><p>A performance ticket submitted by an employee might even be a combination of multiple performance issues all wrapped together into one nasty problem.</p><p>Cloudflare built <a href="https://developers.cloudflare.com/cloudflare-one/"><u>Cloudflare One</u></a>, our <a href="https://www.cloudflare.com/learning/access-management/what-is-sase/">Secure Access Service Edge (SASE) </a>platform, to protect enterprise applications, users, devices, and networks. In particular, this platform relies on two capabilities to simplify troubleshooting performance issues:</p><ul><li><p>Cloudflare’s Zero Trust client, also known as <a href="https://developers.cloudflare.com/cloudflare-one/connections/connect-devices/warp/"><u>WARP</u></a>, forwards and encrypts traffic from devices to Cloudflare edge.</p></li><li><p>Digital Experience Monitoring (<a href="https://developers.cloudflare.com/cloudflare-one/insights/dex/"><u>DEX</u></a>) works alongside WARP to monitor device, network, and application performance.</p></li></ul><p>We’re excited to announce two new AI-powered tools that will make it easier to troubleshoot WARP client connectivity and performance issues.  We’re releasing a new WARP diagnostic analyzer in the <a href="https://www.cloudflare.com/learning/security/glossary/what-is-zero-trust/">Zero Trust</a> dashboard and a <a href="https://www.cloudflare.com/learning/ai/what-is-model-context-protocol-mcp/"><u>MCP (Model Context Protocol)</u></a> server for DEX. Today, every Cloudflare One customer has free access to both of these new features by default.</p>
    <div>
      <h2>WARP diagnostic analyzer</h2>
      <a href="#warp-diagnostic-analyzer">
        
      </a>
    </div>
    <p>The WARP client provides diagnostic logs that can be used to troubleshoot connectivity issues on a device. For desktop clients, the most common issues can be investigated with the information captured in logs called <a href="https://developers.cloudflare.com/learning-paths/warp-overview-course/series/warp-basics-2/"><u>WARP diagnostic</u></a>. Each WARP diagnostic log contains an extensive amount of information spanning days of captured events occurring on the client. It takes expertise to manually go through all of this information and understand the full picture of what is occurring on a client that is having issues. In the past, we’ve advised customers having issues to send their WARP diagnostic log straight to us so that our trained support experts can do a root cause analysis for them. While this is effective, we want to give our customers the tools to take control of deciphering common troubleshooting issues for even quicker resolution. </p><p>Enter the WARP diagnostic analyzer, a new AI available for free in the Cloudflare One dashboard as of today! This AI demystifies information in the WARP diagnostic log so you can better understand events impacting the performance of your clients and network connectivity. Now, when you run a <a href="https://developers.cloudflare.com/cloudflare-one/insights/dex/remote-captures/"><u>remote capture for WARP diagnostics</u></a> in the Cloudflare One dashboard, you can generate an <a href="https://developers.cloudflare.com/cloudflare-one/connections/connect-devices/warp/troubleshooting/warp-logs/#view-warp-diagnostics-summary-beta"><u>AI analysis of the WARP diagnostic file</u></a>. Simply go to your organization’s Zero Trust dashboard and select DEX &gt; Remote Captures from the side navigation bar. After you successfully run diagnostics and produce a WARP diagnostic file, you can open the status details and select View WARP Diag to generate your AI analysis.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/50lz9CFKKJJjL5GpppLu8V/4b404a2ec700713579b3ec9a616ee4c4/image4.png" />
          </figure><p>In the WARP Diag analysis, you will find a Cloudy summary of events that we recommend a deeper dive into.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6rV0XPL9aayuljbw9X46bQ/6fd046dfcf6d882948d1a98912cf7cab/image1.png" />
          </figure><p>Below this summary is an events section, where the analyzer highlights occurrences of events commonly occurring when there are client and connectivity issues. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4OxLtM2CQ4SSs8NTGUdcpn/b7e4f0e3eb519838d50759e6d1decf75/image7.png" />
          </figure><p>Expanding on any of the events detected will reveal a detailed page explaining the event, recommended resources to help troubleshoot, and a list of time stamped recent occurrences of the event on the device. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4ceezR6L1MybxhMtJGuL5U/31f24b0a057871a1f4330ea87f050873/Screenshot_2025-09-03_at_4.20.27%C3%A2__PM.png" />
          </figure><p>To further help with trouble shooting we’ve added a Device and WARP details section at the bottom of this page with a quick view of the device specifications and WARP configurations such as Operating system, WARP version, and the device profile ID.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/41N2iTeHQ9JfrOOsqG8MY5/550fa7573a6d4ed61479679cb4e954d3/image6.png" />
          </figure><p>Finally, we’ve made it easy to take all the information created in your AI summary with you by navigating to the JSON file tab and copying the contents. Your WARP Diag file is also available to download from this screen for any further analysis.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1Sha8rpC7XwSkCvBWt6lv2/2702873ce14fe80904d4f0886e6f3528/image2.png" />
          </figure>
    <div>
      <h2>MCP server for DEX</h2>
      <a href="#mcp-server-for-dex">
        
      </a>
    </div>
    <p>Alongside the new WARP Diagnostic Analyzer, we’re excited to announce that all Cloudflare One customers have access to a MCP (Model Context Protocol) server for our Digital Experience Monitoring (DEX) product. Let’s dive into how this will save our customers time and money.</p><p>Cloudflare One customers use Digital Experience Monitoring (DEX) to monitor devices across their employee network and troubleshoot any connectivity or performance issues. Like many products at Cloudflare, every data point generated by DEX is available to customers via Cloudflare’s API or log ingestion. DEX API and log data is valuable because it enables Enterprises to create custom analytics for their devices’ connectivity and performance in a SIEM (<a href="https://www.cloudflare.com/learning/security/what-is-siem/"><u>Security Information and Event Management</u></a>). Building new data pipelines and various dashboards can take a lot of time and can be expensive. Some of Cloudflare One customers aren’t able to dedicate the engineering time to build custom analytics: whether it’s due to budget, resource constraints, time constraints, or other factors.</p><p>Model Context Protocol (MCP) is an AI standardization that allows AI to connect to other applications. In this instance, using MCP together with DEX gives users a custom analytics experience — without all the work of integrating log data into a SIEM. Network security practitioners can type out a question (like “Show me the performance data for alice@acme.com’s device in a few graphs”), and an MCP server will answer that question with a customized graph on Alice’s device data from the DEX API.</p>
    <div>
      <h2>Troubleshooting DNS query performance with the DEX MCP server</h2>
      <a href="#troubleshooting-dns-query-performance-with-the-dex-mcp-server">
        
      </a>
    </div>
    <p>Any IT admin, security engineer, or network engineer can ask the MCP server a question like: “I received an IT ticket from bob@acme.com who is an enterprise employee. He is complaining about the performance of his device. Can you investigate this for me?”.</p><p>After you submit this question, the DEX MCP server will start by reviewing the device’s overall health.</p>
            <pre><code>Overall Device Health:
✅ Connection Status: Consistently connected for the past 24 hours with 100% availability
✅ WARP Mode: Running in optimal "warp+doh" (WARP + DNS over HTTPS) mode
✅ Client Version: Up-to-date with version 2025.1.7.2</code></pre>
            <p>So far, the device’s health looks great. Next, the DEX MCP server will begin analyzing the results of the performance tests that are configured on Bob’s device.</p>
            <pre><code>📊 Performance Metrics Analysis

HTTP Response Times:
Google.com test: Average 247ms (Range: 81-649ms)
  • Shows significant improvement throughout the day: Started at ~317ms, dropped to ~211ms by evening
Wiki.internal.acme.com: Average 1,374ms (Range: 877-2,431ms)
  • This is notably high and could be the source of performance complaints
  • DNS resolution particularly slow: Average 616ms with spikes up to 1,600ms</code></pre>
            <p>Now we’re making progress in our troubleshooting effort. The DEX MCP server identified that the HTTP GET test for wiki.internal.acme.com has a high average resource fetch time when compared to other websites (like google.com). In particular, high resource fetch time is the result of slow DNS resolution. Now, the DEX MCP server will summarize its findings.</p>
            <pre><code>🚨 Performance Issues Identified
Extremely Slow DNS Resolution:
  • Average DNS response time of 616ms is significantly high
  • Normal DNS should be &lt;50ms for good performance
High Resource Fetch Times on wiki.internal.acme.com:
   • 1.3+ second response times indicate potential application-level issues
   • Could be related to DNS performance or server-side delays
HTTP Status Code Issues:
        • wiki.internal.acme.com is showing consistent HTTP 400 errors
        • This suggests potential configuration or routing issues</code></pre>
            <p>Now, we’ve successfully identified that Bob’s slow wiki performance is the result of a high average DNS response time. Slow DNS resolution increases the average resource fetch time, and also generates HTTP 400 errors for Enterprise employees that are attempting to access wiki.internal.acme.com. Notably, it’s likely that if the wiki is seeing slow DNS response times, other internal applications are experiencing performance issues as well.</p><p>We can ask the DEX MCP server to suggest potential solutions that would fix this wider DNS performance issue.</p>
            <pre><code>🔧 DNS Optimization Strategies

Check WARP DNS configuration and settings
  • Since the device is using WARP+DoH mode, you can optimize the DNS settings
Implement or improve local DNS caching
  • Implement or improve local DNS caching to reduce repeated lookups
Network stack tuning
  • Optimize TCP / UDP settings for DNS</code></pre>
            
    <div>
      <h2>Try out the DEX MCP server today</h2>
      <a href="#try-out-the-dex-mcp-server-today">
        
      </a>
    </div>
    
    <div>
      <h3>Fast and easy option for testing an MCP server</h3>
      <a href="#fast-and-easy-option-for-testing-an-mcp-server">
        
      </a>
    </div>
    <p>Any Cloudflare One customer with a Free, PayGo, or ENT plan can start using the DEX MCP server in less than one minute. The fastest and easiest way to try out the DEX MCP server is to visit <a href="http://playground.ai.cloudflare.com"><u>playground.ai.cloudflare.com</u></a>. There are five steps to get started:</p><ol><li><p>Copy the URL for the DEX MCP server: https://dex.mcp.cloudflare.com/sse</p></li><li><p>Open <a href="http://playground.ai.cloudflare.com"><u>playground.ai.cloudflare.com</u></a> in a browser</p></li><li><p>Find the section in the left side bar titled <b>MCP Servers</b></p></li><li><p>Paste the URL for the DEX MCP server into the URL input box and click <b>Connect</b></p></li><li><p>Authenticate your Cloudflare account, and then start asking questions to the DEX MCP server</p></li></ol><p>It’s worth noting that end users will need to ask specific and explicit questions to the DEX MCP server to get a response. For example, you may need to say, “Set my production account as the active  account”, and then give the separate command, “Fetch the DEX test results for the user bob@acme.com over the past 24 hours”.</p>
    <div>
      <h3>Better experience for MCP servers that requires additional steps</h3>
      <a href="#better-experience-for-mcp-servers-that-requires-additional-steps">
        
      </a>
    </div>
    <p>Customers will get a more flexible prompt experience by configuring the DEX MCP server with their preferred AI assistant (Claude, Gemini, ChatGPT, etc.) that has MCP server support. MCP server support may require a subscription for some AI assistants. You can read the <a href="https://developers.cloudflare.com/cloudflare-one/insights/dex/dex-mcp-server"><u>Digital Experience Monitoring - MCP server documentation</u></a> for step by step instructions on how to get set up with each of the major AI assistants that are available today.</p><p>As an example, you can configure the DEX MCP server in Claude by downloading the Claude Desktop client, then selecting Claude Code &gt; Developer &gt; Edit Config. You will be prompted to open “claude_desktop_config.json” in a code editor of your choice. Simply add the following JSON configuration, and you’re ready to use Claude to call the DEX MCP server.</p>
            <pre><code>{
  "globalShortcut": "",
  "mcpServers": {
    "cloudflare-dex-analysis": {
      "command": "npx",
      "args": [
        "mcp-remote",
        "https://dex.mcp.cloudflare.com/sse"
      ]
    }
  }
}</code></pre>
            
    <div>
      <h2>Get started with Cloudflare One today</h2>
      <a href="#get-started-with-cloudflare-one-today">
        
      </a>
    </div>
    <p>Are you ready to secure your Internet traffic, employee devices, and private resources without compromising speed? You can get started with our new Cloudflare One AI powered tools today.</p><p>The WARP diagnostic analyzer and the DEX MCP server are generally available to all customers. Head to the Zero Trust dashboard to run a WARP diagnostic and learn more about your client’s connectivity with the WARP diagnostic analyzer. You can test out the new DEX MCP server (https://dex.mcp.cloudflare.com/sse) in less than one minute at <a href="http://playground.ai.cloudflare.com"><u>playground.ai.cloudflare.com</u></a>, and you can also configure an AI assistant like Claude to use the new <a href="https://developers.cloudflare.com/cloudflare-one/insights/dex/dex-mcp-server"><u>DEX MCP server</u></a>.</p><p>If you don’t have a Cloudflare account, and you want to try these new features, you can create a free account for up to 50 users. If you’re an Enterprise customer, and you’d like a demo of these new Cloudflare One AI features, you can reach out to your account team to set up a demo anytime. </p><p>You can stay up to date on latest feature releases across the Cloudflare One platform by following the <a href="https://developers.cloudflare.com/cloudflare-one/changelog/"><u>Cloudflare One changelogs</u></a> and joining the conversation in the <a href="https://community.cloudflare.com/"><u>Cloudflare community hub</u></a> or on our <a href="https://discord.cloudflare.com/"><u>Discord Server</u></a>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/CvbpyPLYM62H7B0GhGqcZ/79317635029a9d09d31dacbec6793887/image5.png" />
          </figure><div>
  
</div><p></p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[Monitoring]]></category>
            <category><![CDATA[Analytics]]></category>
            <category><![CDATA[WARP]]></category>
            <category><![CDATA[Device Security]]></category>
            <category><![CDATA[Performance]]></category>
            <category><![CDATA[Dashboard]]></category>
            <category><![CDATA[Zero Trust]]></category>
            <category><![CDATA[Cloudflare One]]></category>
            <category><![CDATA[AI]]></category>
            <guid isPermaLink="false">7vSTlKJvMibVnsLp1YLWKe</guid>
            <dc:creator>Chris Draper</dc:creator>
            <dc:creator>Koko Uko</dc:creator>
        </item>
        <item>
            <title><![CDATA[The crawl-to-click gap: Cloudflare data on AI bots, training, and referrals]]></title>
            <link>https://blog.cloudflare.com/crawlers-click-ai-bots-training/</link>
            <pubDate>Fri, 29 Aug 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ By mid-2025, training drives nearly 80% of AI crawling, while referrals to publishers (especially from Google) are falling and crawl-to-refer ratios show AI consumes far more than it sends back. ]]></description>
            <content:encoded><![CDATA[ <p>In 2025, Generative AI is reshaping how people and companies use the Internet. Search engines once drove traffic to content creators through links. Now, AI training crawlers — the engines behind commonly-used LLMs — are consuming vast amounts of web data, while sending far fewer users back. We covered this shift, along with related <a href="https://blog.cloudflare.com/from-googlebot-to-gptbot-whos-crawling-your-site-in-2025/"><u>trends</u></a> and Cloudflare <a href="https://blog.cloudflare.com/tag/pay-per-crawl/"><u>features</u></a> (like pay per crawl) in early July. Studies from Pew Research Center (<a href="https://www.pewresearch.org/short-reads/2025/04/28/americans-largely-foresee-ai-having-negative-effects-on-news-journalists/"><u>1</u></a>, <a href="https://www.pewresearch.org/short-reads/2025/07/22/google-users-are-less-likely-to-click-on-links-when-an-ai-summary-appears-in-the-results/"><u>2</u></a>) and <a href="https://pressgazette.co.uk/media-audience-and-business-data/google-ai-overviews-publishers-report-clickthroughs-authoritas-report/"><u>Authoritas</u></a> already point to AI overviews — Google’s new AI-generated summaries shown at the top of search results — contributing to sharp declines in news website traffic. For a news site, this means lots of bot hits, but far fewer real readers clicking through — which in turn means fewer people clicking on ads or chances to convert to subscriptions.</p><p>Cloudflare's data shows the same pattern. Crawling by search engines and AI services surged in the first half of 2025 — up 24% year-over-year in June — before slowing to just 4% year-over-year growth in July. How is the space evolving? Which crawling purposes are most common, and how is that changing? Spoiler: training-related crawling is leading the way. In this post, we track AI and search bot crawl activity, what purposes dominate, and which platforms contribute the least referral traffic back to creators.</p>
    <div>
      <h3>Key takeaways</h3>
      <a href="#key-takeaways">
        
      </a>
    </div>
    <ul><li><p>Training crawling grows: Training now drives nearly 80% of AI bot activity, up from 72% a year ago.</p></li><li><p>Publisher referrals drop: Google referrals to news sites fell, with March 2025 down ~9% compared to January.</p></li><li><p>AI &amp; search crawling increase: Crawling rose 32% year-over-year in April 2025, before slowing to 4% year-over-year growth in July.</p></li><li><p>AI-only crawler shifts: OpenAI’s GPTBot more than doubled in share of AI crawling traffic (4.7% to 11.7%), Anthropic’s ClaudeBot rose (6% to ~10%), while ByteDance’s Bytespider fell from 14.1% to 2.4%.</p></li><li><p>Crawl-to-refer imbalance (how many pages a bot crawls per page that a user clicks back to): Anthropic increased referrals but still leads with 38,000 crawls per visitor in July (down from 286,000:1 in January). Perplexity decreased referrals in 2025 — with more crawling but fewer referrals at 194 crawls per visitor in July.</p></li></ul><p>Several of the trends in this blog use <a href="https://radar.cloudflare.com/ai-insights"><u>Cloudflare Radar’s new AI Insights</u></a> features, explained in more detail in the post: “<a href="http://blog.cloudflare.com/ai-crawler-traffic-by-purpose-and-industry"><b><u>A deeper look at AI crawlers: breaking down traffic by purpose and industry</u></b></a>.”</p>
    <div>
      <h2>Google referrals fall as AI Overviews expand</h2>
      <a href="#google-referrals-fall-as-ai-overviews-expand">
        
      </a>
    </div>
    <p>Referral traffic from search is already shifting, as we noted above and as <a href="http://studies"><u>studies</u></a> have shown. In our dataset of news-related customers (spanning the Americas, Europe, and Asia), Google’s referrals have been clearly declining since February 2025. This drop is unusual, since overall Internet traffic (and referrals as well) historically has only dipped during July and August — the summer months when the Northern Hemisphere is largely on break from school or work. The sharpest and least seasonal decline came in March. Despite being a 31-day month, March had almost the same referral volume as the shorter, 28-day February.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1ZWlDsTAtPveEo2Kq8nzu9/ebd655d9ea51f35cfae1f4d09cfecc76/1.png" />
          </figure><p>Looking at longer comparisons: March 2025 referral traffic from Google was 9% lower than January, the same drop seen in June. April was worse, down 15% compared with January.</p><p>This drop seems to coincide with some of Google’s changes. AI Overviews launched in the U.S. in <a href="https://blog.google/products/search/generative-ai-google-search-may-2024/"><u>May 2024</u></a>, but in March 2025, Google upgraded AI Overviews with Gemini 2.0, introduced AI Mode in Labs, and <a href="https://blog.google/feed/were-bringing-the-helpfulness-of-ai-overviews-to-more-countries-in-europe/"><u>expanded</u></a> Overviews to more European countries. By May 2025, AI Mode rolled out broadly in the U.S. with Gemini 2.5, adding conversational search, Deep Search, and personalized recommendations.</p><p>The search-to-news site pipeline seems to be weakening, replaced in part by AI-driven results.</p><p>Looking at a daily perspective, we can also spot a clear U.S.-election-related peak in referrals from Google to the cohort of known news sites on November 5–6, 2024.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1Gtq4mnTg8KdVWaUkpH51A/86e7f7dfeb31f846df4ae8486c25b4aa/2.png" />
          </figure>
    <div>
      <h2>AI and search crawling: spring surge (+24%), summer slowdown</h2>
      <a href="#ai-and-search-crawling-spring-surge-24-summer-slowdown">
        
      </a>
    </div>
    <p><a href="https://blog.cloudflare.com/from-googlebot-to-gptbot-whos-crawling-your-site-in-2025/"><u>In June</u></a>, we talked about search and AI crawler growth, and our picture of the trend is now more complete with more data. To focus only on AI and search crawlers, and to remove the bias of customer growth, we analyzed a fixed set of customers from specific weeks, a method we’ve also used in the <a href="http://radar.cloudflare.com/year-in-review/"><u>Cloudflare Radar Year in Review</u></a>.</p><p>What the data shows: crawling spiked twice: first in November 2024, then again between March and April 2025. April 2025 alone was up 32% compared with May 2024, the first full month where we have comparable data. After that surge, growth stabilized. In June 2025, crawling traffic was still 24% higher year-over-year, but by July the increase was down to just 4%. That shift highlights how quickly crawler activity can accelerate and then cool down.</p><p>As the chart below shows, crawling traffic rose sharply in March and April. It remained high but slightly lower in May, before starting to drop in June. The seasonal dip is similar to what we see in overall Internet traffic during the Northern Hemisphere’s summer months (August and September are often the quietest), though in the case of crawlers, this is likely due to reduced overall web activity rather than bots themselves taking a “break.” Historically, activity tends to rise again in November — as it did in 2024 for AI and search bot traffic — when people spend more time online for shopping and seasonal habits (a pattern we’ve seen in <a href="https://blog.cloudflare.com/from-deals-to-ddos-exploring-cyber-week-2024-internet-trends/"><u>past years</u></a>).</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1SKJcH4r7smlgCBC9vjULt/1311a9ded068a142122630af5afc3766/3.png" />
          </figure><p>Googlebot is <a href="https://blog.cloudflare.com/from-googlebot-to-gptbot-whos-crawling-your-site-in-2025/"><u>still</u></a> the anchor, accounting for 39% of all AI and search crawler traffic, but the fastest growth now comes from AI-specific crawlers, though bots related to Amazon and ByteDance (Bytespider) have lost significant ground. GPTBot’s share grew from 4.7% in July 2024 to 11.7% in July 2025. ClaudeBot also increased, from 6% to nearly 10%, while Meta’s crawler jumped from 0.9% to 7.5%. By contrast, Amazonbot dropped from 10.2% to 5.9%, and ByteDance’s Bytespider dropped from 14.1% to just 2.4%.</p><p>The table below shows how market shares have shifted between July 2024 and July 2025:</p><table><tr><td><p>
</p></td><td><p><b>Bot name</b></p></td><td><p><b>% share July 2024</b></p></td><td><p><b>% share July 2025</b></p></td><td><p><b>Δ percentage-point change</b></p></td></tr><tr><td><p><b>1</b></p></td><td><p>Googlebot</p></td><td><p>37.5</p></td><td><p>39</p></td><td><p>1.5</p></td></tr><tr><td><p><b>2</b></p></td><td><p>GPTBot</p></td><td><p>4.7</p></td><td><p>11.7</p></td><td><p>7</p></td></tr><tr><td><p><b>3</b></p></td><td><p>ClaudeBot</p></td><td><p>6</p></td><td><p>9.9</p></td><td><p>3.9</p></td></tr><tr><td><p><b>4</b></p></td><td><p>Bingbot</p></td><td><p>8.7</p></td><td><p>9.3</p></td><td><p>0.6</p></td></tr><tr><td><p><b>5</b></p></td><td><p>Meta-ExternalAgent</p></td><td><p>0.9</p></td><td><p>7.5</p></td><td><p>6.5</p></td></tr><tr><td><p><b>6</b></p></td><td><p>Amazonbot</p></td><td><p>10.2</p></td><td><p>5.9</p></td><td><p>-4.3</p></td></tr><tr><td><p><b>7</b></p></td><td><p>Googlebot-Image</p></td><td><p>4.1</p></td><td><p>3.3</p></td><td><p>-0.8</p></td></tr><tr><td><p><b>8</b></p></td><td><p>Yandex</p></td><td><p>5</p></td><td><p>2.9</p></td><td><p>-2.1</p></td></tr><tr><td><p><b>9</b></p></td><td><p>GoogleOther</p></td><td><p>4.6</p></td><td><p>2.7</p></td><td><p>-1.8</p></td></tr><tr><td><p><b>10</b></p></td><td><p>Bytespider</p></td><td><p>14.1</p></td><td><p>2.4</p></td><td><p>-11.6</p></td></tr><tr><td><p><b>11</b></p></td><td><p>Applebot</p></td><td><p>1.8</p></td><td><p>1.5</p></td><td><p>-0.3</p></td></tr><tr><td><p><b>12</b></p></td><td><p>ChatGPT-User</p></td><td><p>0.1</p></td><td><p>0.9</p></td><td><p>0.9</p></td></tr><tr><td><p><b>13</b></p></td><td><p>OAI-SearchBot</p></td><td><p>0</p></td><td><p>0.9</p></td><td><p>0.9</p></td></tr><tr><td><p><b>14</b></p></td><td><p>Baiduspider</p></td><td><p>0.5</p></td><td><p>0.5</p></td><td><p>0</p></td></tr><tr><td><p><b>15</b></p></td><td><p>Googlebot-Mobile</p></td><td><p>0.2</p></td><td><p>0.4</p></td><td><p>0.2</p></td></tr></table>
    <div>
      <h2>AI-only crawlers: OpenAI rises, ByteDance falls</h2>
      <a href="#ai-only-crawlers-openai-rises-bytedance-falls">
        
      </a>
    </div>
    <p>Looking only at AI bot traffic (as tracked on our <a href="https://radar.cloudflare.com/explorer?dataSet=ai.bots&amp;groupBy=user_agent&amp;dt=2025-07-01_2025-07-31&amp;timeCompare=2024-07-01"><u>Radar AI page</u></a>), the trend is clear. Since January 2025, GPTBot has steadily increased its crawling volume, driven mainly by training-related activity. ClaudeBot crawling accelerated in June, while Amazonbot and Bytespider activity slowed.</p><p>The <a href="https://radar.cloudflare.com/explorer?dataSet=ai.bots&amp;groupBy=user_agent&amp;dt=2025-07-01_2025-07-31&amp;timeCompare=2024-07-01"><u>chart</u></a> below shows how GPTBot surged over the past 12 months, overtaking Amazonbot and Bytespider, which both fell sharply:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5XRamYFTPqDrQ0bMQSG4C7/e741692f7019a4842b5d82bf4ab64106/4.png" />
          </figure><p>A comparison between July 2024 and July 2025 makes the shift even more obvious. GPTBot gained 16 percentage points, Meta’s crawler rose by more than 15, and ClaudeBot grew by 8. On the shrinking side, Amazonbot dropped 12 percentage points and Bytespider dropped over 31 percentage points.</p><table><tr><td><p>
</p></td><td><p><b>AI-only bots</b></p></td><td><p>July 2024 %</p></td><td><p>July 2025 %</p></td><td><p>Δ percentage-point change</p></td></tr><tr><td><p>1</p></td><td><p>GPTBot</p></td><td><p>11.9</p></td><td><p>28.1</p></td><td><p>16.1</p></td></tr><tr><td><p>2</p></td><td><p>ClaudeBot</p></td><td><p>15</p></td><td><p>23.3</p></td><td><p>8.3</p></td></tr><tr><td><p>3</p></td><td><p>Meta-ExternalAgent</p></td><td><p>2.4</p></td><td><p>17.7</p></td><td><p>15.3</p></td></tr><tr><td><p>4</p></td><td><p>Amazonbot</p></td><td><p>26.4</p></td><td><p>14.1</p></td><td><p>-12.3</p></td></tr><tr><td><p>5</p></td><td><p>Bytespider</p></td><td><p>37.3</p></td><td><p>5.8</p></td><td><p>-31.5</p></td></tr><tr><td><p>6</p></td><td><p>Applebot</p></td><td><p>4.9</p></td><td><p>3.7</p></td><td><p>-1.2</p></td></tr><tr><td><p>7</p></td><td><p>ChatGPT-User</p></td><td><p>0.2</p></td><td><p>2.4</p></td><td><p>2.2</p></td></tr><tr><td><p>8</p></td><td><p>OAI-SearchBot</p></td><td><p>0</p></td><td><p>2.2</p></td><td><p>2.2</p></td></tr><tr><td><p>9</p></td><td><p>TikTokSpider</p></td><td><p>0</p></td><td><p>0.7</p></td><td><p>0.7</p></td></tr><tr><td><p>10</p></td><td><p>imgproxy</p></td><td><p>0</p></td><td><p>0.7</p></td><td><p>0.7</p></td></tr><tr><td><p>11</p></td><td><p>PerplexityBot</p></td><td><p>0</p></td><td><p>0.4</p></td><td><p>0.4</p></td></tr><tr><td><p>12</p></td><td><p>Google-CloudVertexBot</p></td><td><p>0</p></td><td><p>0.3</p></td><td><p>0.3</p></td></tr><tr><td><p>13</p></td><td><p>AI2Bot</p></td><td><p>0</p></td><td><p>0.2</p></td><td><p>0.2</p></td></tr><tr><td><p>14</p></td><td><p>Timpibot</p></td><td><p>0.6</p></td><td><p>0.1</p></td><td><p>-0.5</p></td></tr><tr><td><p>15</p></td><td><p>CCBot</p></td><td><p>0.1</p></td><td><p>0.1</p></td><td><p>0</p></td></tr></table>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/71p4CgiUXwYrb9LIsJCruI/44dd4b232a715b852417853e7026fbcb/5.png" />
          </figure><p>We covered the functionality of these bots in our <a href="https://blog.cloudflare.com/from-googlebot-to-gptbot-whos-crawling-your-site-in-2025/#ai-only-crawlers-perspective"><u>June blog post</u></a>.</p>
    <div>
      <h2>Crawling by purpose: training dominates</h2>
      <a href="#crawling-by-purpose-training-dominates">
        
      </a>
    </div>
    <p>Training is the clear leader.<i> (We classify purpose based on operator disclosures and industry sources, a method we explained in this </i><a href="http://blog.cloudflare.com/ai-crawler-traffic-by-purpose-and-industry"><i><u>AI Week blog</u></i></a><i>.)</i> Over the past 12 months, 80% of AI crawling was for training, compared with 18% for search and just 2% for user actions. In the last six months, the share for training rose further to 82%, while search dropped to 15% and user actions increased slightly to 3%.</p><p>The <a href="https://radar.cloudflare.com/ai-insights#crawl-purpose"><u>chart</u></a> below shows how training-related crawling steadily grew over the past year, far outpacing other purposes:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/10lBzdfhgLKiWrEAIcs691/8b11d8d733c48938a7235dc07f65a83a/6.png" />
          </figure><p>The year-over-year comparison reinforces this trend. In July 2024, training accounted for 72% of AI crawling. By July 2025, it had risen to 79%. Over the same period, search fell from 26% to 17%, while user actions grew modestly from 2% to 3.2%.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2OcV2pA5nOBpOrl8pKPotL/4901f128d5feaba82357972509ba09f2/7.png" />
          </figure>
    <div>
      <h2>Crawl-to-refer ratios shifts: tens of thousands of bot crawls per human click</h2>
      <a href="#crawl-to-refer-ratios-shifts-tens-of-thousands-of-bot-crawls-per-human-click">
        
      </a>
    </div>
    <p>The crawl-to-refer ratio measures how many pages a platform crawls compared with how often it drives users to a website. In practice, a high ratio means heavy crawling but little referral traffic. For example, for every visitor Anthropic refers back to a website, its crawlers have already visited tens of thousands of pages.</p><p>Why does this metric matter? It highlights the imbalance between how much content AI systems consume and how little traffic they return. For publishers, it can feel like giving away the raw material for free. With that in mind, here’s how different platforms compare from January to July 2025.</p><p>Anthropic remains the most crawl-heavy platform. Even after an 87% decline this year, it still crawled 38,000 pages for every referred page visit in July 2025 — the highest imbalance among major AI players. Referrals may be improving, though, after Anthropic added <a href="https://www.anthropic.com/news/web-search"><u>web search to Claude in March 2025</u></a> (initially for U.S. paid users) and expanded it globally by <a href="https://www.brightedge.com/claude-search"><u>May to all users, including the free tier</u></a>. The feature introduced direct citations with clickable URLs, creating new referral pathways.</p><p>The full dataset is below, showing January–July 2025 ratios by platform ordered by the highest ratio average:
(Note: a rising ratio means <i>more</i> bot crawling per human click sent back, while a falling ratio means <i>less</i> bot crawling per human click sent back)

<b>Crawl-to-refer ratio (from </b><a href="https://radar.cloudflare.com/ai-insights?dateStart=2025-07-01&amp;dateEnd=2025-07-31#crawl-to-refer-ratio"><b><u>Cloudflare Radar’s data</u></b></a><b>)</b></p><table><tr><td><p><b>Service</b></p></td><td><p><b>Jan</b></p></td><td><p><b>Feb</b></p></td><td><p><b>Mar</b></p></td><td><p><b>Apr</b></p></td><td><p><b>May</b></p></td><td><p><b>Jun</b></p></td><td><p><b>Jul</b></p></td><td><p><b>Average</b></p></td><td><p><b>% Change Jan-Jul</b></p></td></tr><tr><td><p><b>Anthropic</b></p></td><td><p>286,930.1</p></td><td><p>271,748.2</p></td><td><p>121,612.7</p></td><td><p>130,330.2</p></td><td><p>114,313</p></td><td><p>71,282.8</p></td><td><p>38,065.7</p></td><td><p>147,754.7</p></td><td><p>-86.7%</p></td></tr><tr><td><p><b>OpenAI</b></p></td><td><p>1,217.4</p></td><td><p>1,774.5</p></td><td><p>2,217</p></td><td><p>1200</p></td><td><p>995.6</p></td><td><p>1,655.9</p></td><td><p>1,091.4</p></td><td><p>1,437.8</p></td><td><p>-10.4%</p></td></tr><tr><td><p><b>Perplexity</b></p></td><td><p>54.6</p></td><td><p>55.3</p></td><td><p>201.3</p></td><td><p>300.9</p></td><td><p>199.1</p></td><td><p>200.6</p></td><td><p>194.8</p></td><td><p>172.4</p></td><td><p>256.7%</p></td></tr><tr><td><p><b>Microsoft</b></p></td><td><p>38.5</p></td><td><p>44.2</p></td><td><p>42.3</p></td><td><p>43.3</p></td><td><p>45.1</p></td><td><p>42</p></td><td><p>40.7</p></td><td><p>42.3</p></td><td><p>5.7%</p></td></tr><tr><td><p><b>Yandex</b></p></td><td><p>15.5</p></td><td><p>13.1</p></td><td><p>13.1</p></td><td><p>15.7</p></td><td><p>14.7</p></td><td><p>15.9</p></td><td><p>21.4</p></td><td><p>15.6</p></td><td><p>38.3%</p></td></tr><tr><td><p><b>Google</b></p></td><td><p>3.8</p></td><td><p>6.3</p></td><td><p>14.6</p></td><td><p>22.5</p></td><td><p>16.7</p></td><td><p>13.1</p></td><td><p>5.4</p></td><td><p>11.8</p></td><td><p>43%</p></td></tr><tr><td><p><b>ByteDance</b></p></td><td><p>18</p></td><td><p>16.4</p></td><td><p>3.5</p></td><td><p>2.3</p></td><td><p>1.6</p></td><td><p>1.6</p></td><td><p>0.9</p></td><td><p>6.3</p></td><td><p>-95%</p></td></tr><tr><td><p><b>Baidu</b></p></td><td><p>0.6</p></td><td><p>0.7</p></td><td><p>0.8</p></td><td><p>1.5</p></td><td><p>1.2</p></td><td><p>1</p></td><td><p>0.9</p></td><td><p>1</p></td><td><p>44.5%</p></td></tr><tr><td><p><b>DuckDuckGo</b></p></td><td><p>0.1</p></td><td><p>0.2</p></td><td><p>0.2</p></td><td><p>0.2</p></td><td><p>0.3</p></td><td><p>0.3</p></td><td><p>0.3</p></td><td><p>0.2</p></td><td><p>116.3%</p></td></tr></table><p>Looking at the changes from January to July 2025:</p><ul><li><p><b>Anthropic</b> recorded the steepest decrease in bot to human traffic, down <b>86.7%</b>. From 286,930 bots per human in January, to 38,065 bots per human in July, the change shows a dramatic increase in referrals. Despite the change, it remains by far the most crawl-heavy platform, with tens of thousands of pages still crawled for every referral.</p></li><li><p><b>Perplexity</b> moved in the opposite direction, with bot crawling increasing <b>+256.7%</b> relative to human visitors; climbing from <b>54 bots per human</b> in January to <b>195 bots per human</b> in July. While the ratio is still far below Anthropic, the increase shows it is crawling more heavily, relative to the traffic it refers, than it did earlier.</p></li><li><p><b>OpenAI</b> ratio dropped slightly, from 1,217 bots per human in January to 1,091 in July (-10%). The shift is smaller than Anthropic’s but suggests OpenAI is sending a bit more referral traffic relative to its crawling.</p></li><li><p><b>Microsoft</b> stayed steady, with its ratio moving only slightly, from 38.5 bots per human in January to 40.7 in July (+6%). This consistency suggests stable behavior from Bing-linked services.</p></li><li><p><b>Yandex</b> increased from 15.5 bots per human in January to 21.4 in July (+38%). The overall ratio is far smaller than Anthropic’s or Perplexity’s, but it shows Yandex is crawling more heavily relative to the traffic it sends back.</p></li></ul><p>Alongside measuring crawling volumes and referral traffic (now also visible on the<a href="https://radar.cloudflare.com/ai-insights#ai-bot-best-practices"><u> AI Insights page of Cloudflare Radar</u></a>), it’s worth looking at whether AI operators follow good practices when deploying their bots. Cloudflare data shows that most leading AI crawlers are on our <a href="https://radar.cloudflare.com/bots#verified-bots"><u>verified bots</u></a> list, meaning their IP addresses match published ranges and they respect robots.txt. But adoption of newer standards like <a href="https://developers.cloudflare.com/bots/concepts/bot/verified-bots/web-bot-auth/"><u>WebBotAuth</u></a> — which uses cryptographic signatures in HTTP messages to confirm a request comes from a specific bot, and is especially relevant today — is still missing. </p><p>Meta, OpenAI, and Anthropic run distinct bots for different purposes, while Google and Microsoft rely on unified crawlers. Anthropic, however, still lags in verification, which makes it easier for bad actors to spoof its crawler and ignore robots.txt. Without verification, it’s difficult to distinguish real from fake traffic — leaving its compliance effectively unclear. (A longer list of AI bots is available <a href="https://radar.cloudflare.com/ai-insights#ai-bot-best-practices"><u>here</u></a>).</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4EvNGFKp6pGQUP84P33qJG/b646c0aad05d68d3f9c4a37d08bd483f/8.png" />
          </figure>
    <div>
      <h2>Conclusion and what’s next</h2>
      <a href="#conclusion-and-whats-next">
        
      </a>
    </div>
    <p>If training-related crawling continues to dominate while referrals stay flat, creators face a paradox: feeding AI systems without gaining traffic in return. Many want their content to appear in chatbot answers, but without monetization or cooperation, the incentive to produce quality work declines.</p><p>The Web now stands at a fork in the road. Either a new balance emerges — one where the new AI era helps sustain publishers and creators — or AI turns the open web into a one-way training set, extracting value with little flowing back.</p><p>You can learn more about some of these data trends on Cloudflare Radar’s updated<a href="https://radar.cloudflare.com/ai-insights"><u> AI Insights page</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Radar]]></category>
            <category><![CDATA[Internet Trends]]></category>
            <category><![CDATA[Traffic]]></category>
            <category><![CDATA[Bots]]></category>
            <guid isPermaLink="false">71UVAVb7ICHgxWp6yhCLoA</guid>
            <dc:creator>João Tomé</dc:creator>
        </item>
        <item>
            <title><![CDATA[A deeper look at AI crawlers: breaking down traffic by purpose and industry]]></title>
            <link>https://blog.cloudflare.com/ai-crawler-traffic-by-purpose-and-industry/</link>
            <pubDate>Thu, 28 Aug 2025 14:05:00 GMT</pubDate>
            <description><![CDATA[ We are extending AI-related insights on Cloudflare Radar with new industry-focused data and a breakdown of bot traffic by purpose, such as training or user action.  ]]></description>
            <content:encoded><![CDATA[ <p>Search platforms historically crawled web sites with the implicit promise that, as the sites showed up in the results for relevant searches, they would send traffic on to those sites — in turn leading to ad revenue for the publisher. This model worked fairly well for several decades, with a whole industry emerging around optimizing content for optimal placement in search results. It led to higher click-through rates, more eyeballs for publishers, and, ideally, more ad revenue. However, the emergence of AI platforms over the last several years, and the incorporation of AI "overviews" into classic search platforms, has turned the model on its head. When users turn to these AI platforms with queries that used to go to search engines, they often won't click through to the original source site once an answer is provided — and that assumes that a link to the source is provided at all! No clickthrough, no eyeballs, and no ad revenue. </p><p>To provide a perspective on the scope of this problem, Radar <a href="https://blog.cloudflare.com/ai-search-crawl-refer-ratio-on-radar/"><u>launched</u></a> <a href="https://radar.cloudflare.com/ai-insights#crawl-to-refer-ratio"><u>crawl/refer ratios</u></a> on July 1, based on traffic seen across our whole customer base. These ratios effectively compare the number of crawling requests for HTML pages from the <a href="https://www.cloudflare.com/learning/bots/what-is-a-web-crawler/"><u>crawler</u></a> associated with a given platform, to the number of HTML page requests referred by that platform (measuring human traffic). This data complements insights into <a href="https://radar.cloudflare.com/ai-insights#ai-bot-crawler-traffic"><u>AI bot &amp; crawler traffic trends</u></a> that were <a href="https://blog.cloudflare.com/bringing-ai-to-cloudflare/#ai-bot-traffic-insights-on-cloudflare-radar"><u>launched</u></a> during Birthday Week 2024.</p><p>Today, we're adding two new capabilities to the <a href="https://radar.cloudflare.com/ai-insights"><b><u>AI Insights</u></b></a> page on Cloudflare Radar to give you more insight into this activity: industry-focused AI bot traffic data, and a new breakdown of AI bot traffic by its purpose.</p>
    <div>
      <h2>Traffic by type</h2>
      <a href="#traffic-by-type">
        
      </a>
    </div>
    <p>Since the launch of <a href="https://www.cloudflare.com/learning/ai/what-is-large-language-model/"><u>LLMs</u></a> into the public consciousness in November 2022, much of the crawling traffic seen from user agents associated with AI platforms has been to collect content used to train AI models. This crawling activity can be aggressive at times, often ignoring <a href="https://radar.cloudflare.com/ai-insights#ai-user-agents-found-in-robotstxt"><u>directives found in robots.txt files</u></a>. In addition to offering chatbots trained on this <a href="https://www.cloudflare.com/learning/bots/what-is-content-scraping/"><u>scraped content</u></a>, AI platforms have emerged that aim to replace classic search tools, while those tools have themselves integrated AI-powered summaries as part of their results. These platforms may crawl your site to build indexes for their search engines. And some AI platforms may crawl your site in response to a specific user prompt, such as looking for flights to plan a vacation.</p><p>The new <b>Crawl purpose</b> selector within the <b>AI bot &amp; crawler traffic</b> card allows users to select between <b>Training</b>, <b>Search</b>, <b>User action</b>, and <b>Undeclared</b>. (The latter is for crawlers where no information is available from the operator or other industry sources regarding its purpose.) </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4bIoxF54OFCmecoOWOHDQ3/8e252d3ffbb4f948a76158661a4b013a/1_-_crawlpurpose-dropdown.png" />
          </figure><p>Once a purpose is selected, the <a href="https://radar.cloudflare.com/ai-insights#http-traffic-by-bot"><b><u>HTTP traffic by bot</u></b></a> graph updates to show traffic trends over the selected time period for the top five most active AI bots that crawl for the selected purpose.</p><p>As an example, selecting <b>User action</b> results in a <a href="https://radar.cloudflare.com/ai-insights?dateStart=2025-07-01&amp;dateEnd=2025-07-28#http-traffic-by-bot"><u>graph</u></a> like the one below, which covers the first 28 days of July 2025. OpenAI’s <i>ChatGPT-User</i> bot is responsible for nearly three quarters of the request traffic from this cohort of crawlers. A daily cycle is clearly evident, suggesting regular usage of ChatGPT in that fashion, with such usage gradually increasing throughout the month. If <i>ChatGPT-User </i>is removed from the chart, <i>Perplexity-User</i> also exhibits a similar pattern.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/Vt5HUwATxJgWezhbpyA0N/f1b2745802ba4c1b7ee33b3c77b6ed4d/2_-_http_traffic_-_user_action.png" />
          </figure><p>A new <a href="https://radar.cloudflare.com/ai-insights#crawl-purpose"><b><u>Crawl purpose</u></b></a> graph has also been added to Radar, breaking out traffic trends by purpose. <i>Training</i> traffic, responsible for nearly 80% of the crawling from AI bots, is somewhat erratic in nature, with no clear cyclical pattern. However, such patterns are visible for the <i>User action</i> and <i>Undeclared</i> purposes, as shown in the <a href="https://radar.cloudflare.com/ai-insights?dateStart=2025-07-01&amp;dateEnd=2025-07-28#crawl-purpose"><u>graph</u></a> below, although they account for less than 5% of AI bot traffic across this time period.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2jis2lHk6KjbWpOQPcARmy/7ae33385be2ac1d820104a2dc22f489a/3_-_crawlpurpose-graph.png" />
          </figure><p>Within the <a href="https://radar.cloudflare.com/explorer?dataSet=ai.bots"><u>Data Explorer</u></a> view for the <b>AI Bots &amp; Crawlers</b> dataset, you can now <a href="https://radar.cloudflare.com/explorer?dataSet=ai.bots&amp;dt=28d&amp;groupBy=crawl_purpose"><u>break the data down by </u><b><u>Crawl purpose</u></b></a> to explore how the activity has changed over time. Alternatively, you can <a href="https://radar.cloudflare.com/explorer?dataSet=ai.bots&amp;dt=28d&amp;groupBy=user_agent&amp;filters=crawlPurpose%253DTraining"><u>break the data down by </u><b><u>User agent</u></b><u>, and filter by </u><b><u>Crawl purpose</u></b></a>, to explore traffic trends across a larger set of bots (beyond the top five). <a href="https://radar.cloudflare.com/explorer?dataSet=ai.bots&amp;dt=28d&amp;groupBy=user_agent&amp;filters=crawlPurpose%253DTraining&amp;timeCompare=1"><u>Comparisons with previous time periods</u></a> are available here as well.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6kCgMWSeVGYdQ9jnkAOMhe/ab71e21d0b620b78b72aaf90f7ecbb46/4_-_dataexplorer_-_training.png" />
          </figure>
    <div>
      <h2>Visibility by industry</h2>
      <a href="#visibility-by-industry">
        
      </a>
    </div>
    <p>You can use your own traffic data to see how aggressively crawlers <a href="https://www.cloudflare.com/learning/ai/how-to-prevent-web-scraping/">scrape</a> your content. You can also see how frequently they refer traffic back to you. However, you may also want to understand how those measurements compare with your peer group — are you being crawled more or less frequently, and are the platforms referring more or less traffic back to your sites? The new industry set filtering available for the <a href="https://radar.cloudflare.com/ai-insights#http-traffic-by-bot"><b><u>HTTP traffic by bot</u></b><u> graph</u></a> and the <a href="https://radar.cloudflare.com/ai-insights#crawl-to-refer-ratio"><b><u>Crawl-to-refer ratio</u></b><u> table</u></a> in the <a href="https://radar.cloudflare.com/ai-insights"><b><u>AI Insights</u></b></a> section of Radar can provide you with this perspective.</p><p>Within the <b>AI bot &amp; crawler traffic</b> card on the AI Insights page, select an industry set from the drop-down list at the top right of the card. The graphs in the <b>HTTP traffic by bot</b> and <b>Crawl purpose</b> sections of the card update to reflect the selection, as does the <b>Crawl-to-refer ratio</b> table. (Selecting a <b>Crawl purpose</b> from that drop-down menu will further update the <b>HTTP traffic by bot</b> graph.)</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6NBLZ4KnJ2A75L92a3bVK4/1665549e5761b0ae449d651a49ba7e64/5_-_industry_set_-_dropdown.png" />
          </figure><p>It is interesting to observe how the crawling patterns change between industry sets, along with the mix of most active bots and crawl-to-refer ratios. For example, across the first week of August, with <a href="https://radar.cloudflare.com/ai-insights?dateStart=2025-08-01&amp;dateEnd=2025-08-07#http-traffic-by-bot"><u>no vertical or crawl purpose selected</u></a>, <b>ClaudeBot</b> and <b>GPTBot</b> account for nearly half of the observed crawling activity, with <b>Meta-ExternalAgent</b> the only one among the top five exhibiting activity that remotely resembles a pattern. For the default view, Anthropic had the highest crawl-to-refer ratio at nearly 50,000:1, followed by OpenAI at 887:1 and Perplexity at 118:1.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2StNvYYHAK9PZ6U0tGvwiH/68266c10a50ef70507a645a5dfcc2059/6_-_http_traffic_-_no_vertical.png" />
          </figure><p>However, when the <a href="https://radar.cloudflare.com/ai-insights?industrySet=News+%26+Publications&amp;dateStart=2025-08-01&amp;dateEnd=2025-08-07"><b><u>News and Publications industry set is selected</u></b></a>, we see<b> </b>a much tighter distribution of traffic among the top five, ranging from <b>ChatGPT-User</b>’s 14.9% share of traffic to <b>GPTBot</b>’s 17.4% share. <b>ChatGPT-User</b>’s presence among the top five suggests that a significant number of users may have been asking questions about current events during that period of time. For these <b>News and Publications</b> sites, the crawl-to-refer ratios are lower than the default view, with Anthropic at 2,500:1, OpenAI at 152:1, and Perplexity at 32.7:1. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4EpH7k6tQKSMTdXIQtoG1y/7ad2383f442e390760d0eb2a3d3b7127/7_-_industry_set_-_news___publications.png" />
          </figure><p>As a third example, we find that the mix again shifts for the <a href="https://radar.cloudflare.com/ai-insights?industrySet=Computer+%26+Electronics&amp;dateStart=2025-08-01&amp;dateEnd=2025-08-07#http-traffic-by-bot"><b><u>Computer and Electronics industry set</u></b></a>. While <b>GPTBot</b> was again the most active AI bot, <b>Amazonbot</b> moved up into second place; together these bots now account for over 40% of crawling traffic. <b>ClaudeBot</b> and <b>Meta-ExternalAgent</b> both had a 13.9% share of the crawling traffic, with ByteDance’s <b>ByteSpider</b> rounding out the top five. The crawl-to-refer ratios for this vertical are again lower than for the unfiltered view, with Anthropic down to 8,800:1, OpenAI at 401.7:1, and Perplexity at 88:1.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5KjHMu0t6uCJAHEjgzDiNz/31267af8484006c6be1b834107cb3052/8_-_industry_set_-_computer___electronics.png" />
          </figure><p>Within Data Explorer, you can now break down <b>AI Bots &amp; Crawler</b> data by Vertical and Industry. (A vertical is a pre-defined collection of multiple related industries), and you can also filter <b>Crawl purpose</b> and <b>User agent</b> breakdowns by Vertical and Industry. For example, the graphs below illustrate the <a href="https://radar.cloudflare.com/explorer?dataSet=ai.bots&amp;groupBy=user_agent&amp;dt=2025-08-01_2025-08-07&amp;filters=vertical%253DFinance%252Cindustry%253DCryptocurrency#result"><u>traffic trends by AI crawler</u></a> for sites within the <b>Cryptocurrency</b> industry under the <b>Finance</b> vertical, as well as the <a href="https://radar.cloudflare.com/explorer?dataSet=ai.bots&amp;groupBy=crawl_purpose&amp;dt=2025-08-01_2025-08-07&amp;filters=vertical%253DFinance%252Cindustry%253DCryptocurrency#result"><u>traffic trends by crawl purpose</u></a> for that industry/vertical pair. While these sites see crawling traffic from quite a few bots, three-quarters of that traffic during the first week of August was concentrated in just four bots, and 80% of it was for gathering information to train models.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/39MVSCz4a41eKDqIR0Dj4Z/5489805b938051212ca0374e892ef756/9_-_dataexplorer_-_http_traffic_-_finance_cryptocurrency.png" />
          </figure>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7ppfZea6L4fdZ4RKWIVNq5/a605a2f3b45bb6ef540ca57d78bb145e/10_-_dataexplorer_-_crawl_purpose_-_finance_cryptocurrency.png" />
          </figure><p>Because the Industry sets shown on the main <b>AI Insights</b> page are manually curated collections of related industries, clicking through to the Data Explorer view from one of those graphs will pre-populate the Industry selector with the relevant entries. For example, clicking through from the <a href="https://radar.cloudflare.com/ai-insights?industrySet=Gaming+%26+Gambling#http-traffic-by-bot"><b><u>HTTP traffic by bot</u></b><u> graph for the </u><b><u>Gaming &amp; Gambling</u></b><u> industry set</u></a> results in the following <a href="https://radar.cloudflare.com/explorer?dataSet=ai.bots&amp;groupBy=user_agent&amp;filters=industry%253DComputer%25252520Games%25252CGambling%25252520%25252526%25252520Casinos%25252CGambling%25252520and%25252520Casinos%2525253B%25252520Recreation%25252CGaming&amp;dt=2025-08-01_2025-08-07"><u>Data Explorer view</u></a>, which lists the component industries.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/60FepjNCd25CFKWTQzdVsq/2772c2782c93772f4a55364f06846bd5/11_-_dataexplorer_-_gaming_gambling_industries.png" />
          </figure>
    <div>
      <h2>Conclusion</h2>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>AI crawler traffic has become a fact of life for content owners, and the complexity of dealing with it has increased as bots are used for purposes beyond LLM training. <a href="https://contentsignals.org/"><u>Work is underway</u></a> to allow website publishers to declare how automated systems should use their content. However, it will take some time for these proposed solutions to be standardized, and for both publishers and crawlers to adopt them. As the space evolves, we’ll continue to expand Cloudflare Radar’s insights into AI crawler activity.</p><p>If you share our AI-related graphs on social media, be sure to tag us: <a href="https://x.com/CloudflareRadar"><u>@CloudflareRadar</u></a> (X), <a href="https://noc.social/@cloudflareradar"><u>noc.social/@cloudflareradar</u></a> (Mastodon), and <a href="https://bsky.app/profile/radar.cloudflare.com"><u>radar.cloudflare.com</u></a> (Bluesky). If you have questions or comments, you can reach out to us on social media, or contact us via <a><u>email</u></a>.</p><div>
  
</div><p></p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Radar]]></category>
            <category><![CDATA[Traffic]]></category>
            <category><![CDATA[Bots]]></category>
            <guid isPermaLink="false">6PuiWWmAnS4oHYFYoYysBU</guid>
            <dc:creator>David Belson</dc:creator>
        </item>
        <item>
            <title><![CDATA[Evaluating image segmentation models for background removal for Images]]></title>
            <link>https://blog.cloudflare.com/background-removal/</link>
            <pubDate>Thu, 28 Aug 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ An inside look at how the Images team compared dichotomous image segmentation models to identify and isolate subjects in an image from the background. ]]></description>
            <content:encoded><![CDATA[ <p>Last week, we wrote about <a href="https://blog.cloudflare.com/ai-face-cropping-for-images/"><u>face cropping for Images</u></a>, which runs an open-source face detection model in <a href="https://developers.cloudflare.com/workers-ai/"><u>Workers AI</u></a> to automatically crop images of people at scale.</p><p>It wasn’t too long ago when deploying AI workloads was prohibitively complex. Real-time inference previously required specialized (and costly) hardware, and we didn’t always have standard abstractions for deployment. We also didn’t always have Workers AI to enable developers — including ourselves — to ship AI features without this additional overhead.</p><p>And whether you’re skeptical or celebratory of AI, you’ve likely seen its explosive progression. New benchmark-breaking computational models are released every week. We now expect a fairly high degree of accuracy — the more important differentiators are how well a model fits within a product’s infrastructure and what developers do with its predictions.</p><p>This week, we’re introducing <a href="https://developers.cloudflare.com/images/transform-images/transform-via-url/#segment">background removal for Images</a>. This feature runs a dichotomous image segmentation model on Workers AI to isolate subjects in an image from their backgrounds. We took a controlled, deliberate approach to testing models for efficiency and accuracy.</p><p>Here’s how we evaluated various image segmentation models to develop background removal.</p>
    <div>
      <h2>A primer on image segmentation</h2>
      <a href="#a-primer-on-image-segmentation">
        
      </a>
    </div>
    <p>In computer vision, image segmentation is the process of splitting an image into meaningful parts.</p><p>Segmentation models produce a mask that assigns each pixel to a specific category. This differs from detection models, which don’t classify every pixel but instead mark regions of interest. A face detection model, such as the one that informs <a href="https://developers.cloudflare.com/images/transform-images/transform-via-url/#gravity"><u>face cropping</u></a>, draws bounding boxes based on where it thinks there are faces. (If you’re curious, <a href="https://blog.cloudflare.com/ai-face-cropping-for-images/#from-pixels-to-people"><u>our post on face cropping</u></a> discusses how we use these bounding boxes to perform crop and zoom operations.)</p><p>Salient object detection is a type of segmentation that highlights the parts of an image that most stand out. Most salient detection models create a binary mask that categorizes the most prominent (or salient) pixels as the “foreground” and all other pixels as the “background”. In contrast, a multi-class mask considers the broader context and labels each pixel as one of several possible classes, like “dog” or “chair”. These multi-class masks are the basis of content analysis models, which distinguish which pixels belong to specific objects or types of objects.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/qV2QVZYEdqdigCTuqBuHu/cf4873dddf3b30503aac6643ded1a5ab/image3.png" />
          </figure><p><sub>In this photograph of my dog, a detection model predicts that a bounding box contains a dog; a segmentation model predicts that some pixels belong to a dog, while all other pixels don’t.</sub></p><p>For our use case, we needed a model that could produce a soft saliency mask, which predicts how strongly each pixel belongs to either the foreground (objects of interest) or the background. That is, each pixel is assigned a value on a scale of 0–255, where 0 is completely transparent and 255 is fully opaque. Most background pixels are labeled at (or near) 0; foreground pixels may vary in opacity, depending on its degree of saliency.</p><p>In principle, a background removal feature must be able to accurately predict saliency across a broad range of contexts. For example, e-commerce and retail vendors want to display all products on a uniform, white background; in creative and image editing applications, developers want to enable users to create stickers and cutouts from uploaded content, including images of people or avatars.</p><p>In our research, we focused primarily on the following four image segmentation models:</p><ul><li><p><a href="https://arxiv.org/abs/2005.09007"><b><u>U</u></b><b><u><sup>2</sup></u></b><b><u>-Net (U Square Net)</u></b></a>: Trained on the largest saliency dataset (<a href="https://saliencydetection.net/duts/"><u>DUST-TR</u></a>) of 10,553 images, which were then horizontally flipped to reach a total of 21,106 training images.</p></li><li><p><a href="https://arxiv.org/abs/2203.03041"><b><u>IS-Net (Intermediate Supervision Network)</u></b></a>: A novel, two-step approach from the same authors of U2-Net; this model produces cleaner boundaries for images with noisy, cluttered backgrounds.</p></li><li><p><a href="https://arxiv.org/abs/2401.03407"><b><u>BiRefNet (Bilateral Reference Network)</u></b></a>: Specifically designed to segment complex and high-resolution images with accuracy by checking that the small details match the big picture.</p></li><li><p><a href="https://arxiv.org/abs/2304.02643"><b><u>SAM (Segment Anything Model)</u></b></a>: Developed by Meta to allow segmentation by providing prompts and input points.</p></li></ul><p>Different scales of information allow computational models to build a holistic view of an image. Global context considers the overall shape of objects and how areas of pixels relate to the entire image, while local context traces fine details like edges, corners, and textures. If local context focuses on the trees and their leaves, then global context represents the entire forest.</p><p><a href="https://github.com/xuebinqin/U-2-Net"><u>U</u><u><sup>2</sup></u><u>-Net</u></a> extracts information using a multi-scale approach, where it analyzes an image at different zoom levels, then combines its predictions in a single step. The model analyzes global and local context at the same time, so it works well on images with multiple objects of varying sizes.</p><p><a href="https://github.com/xuebinqin/DIS"><u>IS-Net</u></a> introduces a new, two-step strategy called intermediate supervision. First, the model separates the foreground from the background, identifying potential areas that likely belong to objects of interest — all other pixels are labeled as the background. Second, it refines the boundaries of the highlighted objects to produce a final pixel-level mask.</p><p>The initial suppression of the background results in cleaner, more precise edges, as the segmentation focuses only on the highlighted objects of interest and is less likely to mistakenly include background pixels in the final mask. This model especially excels when dealing with complex images with cluttered backgrounds.</p><p>Both models output their predictions in a single direction for scale. U<sup>2</sup>-Net interprets the global and local context in one pass, while Is-Net begins with the global context, then focuses on the local context.</p><p>In contrast, <a href="https://github.com/ZhengPeng7/BiRefNet"><u>BiRefNet</u></a> refines its predictions over multiple passes, moving in both contextual directions. Like Is-Net, it initially creates a map that roughly highlights the salient object, then traces the finer details. However, BiRefNet moves from global to local context, then from local context back to global. In other words, after refining the edges of the object, it feeds the output back to the large-scale view. This way, the model can check that the small-scale details align with the broader image structure, providing higher accuracy on high-resolution images.</p><p>U<sup>2</sup>-Net, IS-Net, and BiRefNet are exclusively saliency detection models, producing masks that distinguish foreground pixels from background pixels. However, <a href="https://github.com/facebookresearch/segment-anything"><u>SAM</u></a> was designed to be more extensible and general; its primary goal is to segment any object based on specified inputs, not only salient objects. This means that the model can also be used to create multi-class masks that label various objects within an image, even if they aren’t the primary focus of an image.</p>
    <div>
      <h2>How we measure segmentation accuracy</h2>
      <a href="#how-we-measure-segmentation-accuracy">
        
      </a>
    </div>
    <p>In most saliency datasets, the actual location of the object is known as the ground-truth area. These regions are typically defined by human annotators, who manually trace objects of interest in each image. This provides a reliable reference to evaluate model predictions.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6wAV8lQcsZHosKoFyEIce1/495b3d70960027b795ec1a62f2d46a59/BLOG-2928_3.png" />
          </figure><p><sub>Photograph by </sub><a href="https://www.linkedin.com/in/fang-allen"><sub><u>Allen Fang</u></sub></a></p><p>Each model outputs a predicted area (where it thinks the foreground pixels are), which can be compared against the ground-truth area (where the foreground pixels actually are).</p><p>Models are evaluated for segmentation accuracy based on common metrics like Intersection over Union, Dice coefficient, and pixel accuracy. Each score takes a slightly different approach to quantify the alignment between the predicted and ground-truth areas (“P” and “G”, respectively, in the formulas below).</p>
    <div>
      <h3>Intersection over Union</h3>
      <a href="#intersection-over-union">
        
      </a>
    </div>
    <p>Intersection over Union (IoU), also called the Jaccard index, measures how well the predicted area matches the true object. That is, it counts the number of foreground pixels that are shared in both the predicted and ground-truth masks. Mathematically, IoU is written as:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6zVQSLlKaFuVUQrDcAlf0Y/4254010745caf0d207d8f8e8181f4c9c/BLOG-2928_4.png" />
          </figure><p><sub>Jaccard formula</sub></p><p>The formula divides the intersection (P∩G), or the pixels where the predicted and ground-truth areas overlap, by the union (P∪G), or the total area of pixels that belong to either area, counting the overlapping pixels only once.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7KFLB15btpCQuKuTqakBjp/91e78ec6d565e3723c5d76b3a65a441d/unnamed__23_.png" />
          </figure><p>IoU produces a score between 0 and 1. A higher value indicates a closer overlap between the predicted and ground-truth areas. A perfect match, although rare, would score 1, while a smaller overlapping area brings the score closer to 0.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/oe82x3rPo8XoNnwG3KBRy/22f591adb6ab27b3ad05f91b13eddff7/BLOG-2928_6.png" />
          </figure>
    <div>
      <h3>Dice coefficient</h3>
      <a href="#dice-coefficient">
        
      </a>
    </div>
    <p>The Dice coefficient, also called the Sørensen–Dice index, similarly compares how well the model’s prediction matches reality, but is much more forgiving than the IoU score. It gives more weight to the shared pixels between the predicted and actual foreground, even if the areas differ in size. Mathematically, the Dice coefficient is written as:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4UiJUJrjagwkmQNvdkiPC3/e17eaa8f22f57114a91f1e58fc3a76fb/BLOG-2928_7.png" />
          </figure><p><sub>Sørensen–Dice formula</sub></p><p>The formula divides twice the intersection (P∩G) by the sum of pixels in both predicted and ground-truth areas (P+G), counting any overlapping pixels twice.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7vcFBAoRJ9wpyAt8m4Sn7x/8b1962de717701ff348e90ec8b86286e/BLOG-2928_8.png" />
          </figure><p>Like IoU, the Dice coefficient also produces a value between 0 and 1, indicating a more accurate match as it approaches 1.</p>
    <div>
      <h3>Pixel accuracy</h3>
      <a href="#pixel-accuracy">
        
      </a>
    </div>
    <p>Pixel accuracy measures the percentage of pixels that were correctly labeled as either the foreground or the background. Mathematically, pixel accuracy is written as:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/40HkiVe1a2i1dSguDk1TxO/990e49cd4d40a4eaa29078948bc9d7e8/unnamed__24_.png" />
          </figure><p><sub>Pixel accuracy formula</sub></p><p>The formula divides the number of correctly predicted pixels by the total number of pixels in the image.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1GX83EmXBSLhGlHvGFLqnn/f65fbd110f4b1d201f7585723ced0f34/image10.png" />
          </figure><p>The total area of correctly predicted pixels is the sum of foreground and background pixels that accurately match the ground-truth areas.</p><p>The correctly predicted foreground is the intersection of the predicted and ground-truth areas (P∩G). The inverse of the predicted area (P’, or 1–P) represents the pixels that the model identifies as the background; the inverse of the ground-truth area (G’, or 1–G) represents the actual boundaries of the background. When these two inverted areas overlap (P’∩G’, or (1–P)∩(1–G)), this intersection is the correctly predicted background.</p>
    <div>
      <h2>Interpreting the metrics</h2>
      <a href="#interpreting-the-metrics">
        
      </a>
    </div>
    <p>Of the three metrics, IoU is the most conservative measure of segmentation accuracy. Small mistakes, such as including extra background pixels in the predicted foreground, reduce the score noticeably. This metric is most valuable for applications that require precise boundaries, such as autonomous driving systems.</p><p>Meanwhile, the Dice coefficient rewards the overlapping pixels more heavily, and subsequently tends to be higher than the IoU score for the same prediction. In model evaluations, this metric is favored over IoU when it’s more important to capture the object than to penalize mistakes. For example, in medical imaging, the risk of missing a true positive substantially outweighs the inconvenience of flagging a false positive.</p><p>In the context of background removal, we biased toward the IoU score and Dice coefficient over pixel accuracy. Pixel accuracy can be misleading, especially when processing an image where background pixels comprise the majority of pixels.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7K8TWmRLdJNIza43UoXhD8/c9a42ed7074ce975afd8f7e783db5849/BLOG-2928_11.png" />
          </figure><p>For example, consider an image with 900 background pixels and 100 foreground pixels. A model that correctly predicts only 5 foreground pixels — 5% of all foreground pixels — will score deceptively high in pixel accuracy. Intuitively, we’d likely say that this model performed poorly. However, assuming all 900 background pixels were correctly predicted, the model maintains 90.5% pixel accuracy, despite missing the subject almost entirely.</p>
    <div>
      <h2>Pixels, predictions, and patterns</h2>
      <a href="#pixels-predictions-and-patterns">
        
      </a>
    </div>
    <p>To determine the most suitable model for the Images API, we performed a series of tests using the open-source <a href="https://github.com/danielgatis/rembg"><u>rembg</u></a> library, which combines all relevant models in a single interface.</p><p>Each model was tasked with outputting a prediction mask to label foreground versus background pixels. We pulled images from two saliency datasets: <a href="https://huggingface.co/datasets/schirrmacher/humans"><b><u>Humans</u></b></a> contains over 7,000 images of people with varying skin tones, clothing, and hairstyles, while <a href="https://xuebinqin.github.io/dis/index.html#overview"><b><u>DIS5K</u></b></a> (version 1.5) spans a vast range of objects and scenes. If a model contained variants that were pre-trained on specific types of segmentation (e.g. clothes, humans), then we repeated the tests for the generalized model and each variant.</p><p>Our experiments were executed on a GPU with 23 GB VRAM to mirror realistic hardware constraints, similar to the environment where we already run a face detection model. We also replicated the same tests on a larger GPU instance with 94 GB VRAM; this served as an upper-bound reference point to benchmark potential speed gains if additional compute were available. Cloudflare typically reserves larger GPUs for more compute-intensive <a href="https://developers.cloudflare.com/workers-ai/models/"><u>AI workloads</u></a> — we viewed these tests more as an exploration for comparison than as a production scenario.</p><p>During our analysis, we started to see key trends emerge:</p><p>On the smaller GPU, inference times were generally faster for lightweight models like U<sup>2</sup>-Net (176 MB) and Is-Net (179 MB). The average speed across both datasets were 307 milliseconds for U<sup>2</sup>-Net and 351 milliseconds for Is-Net. On the opposite end, BiRefNet (973 MB) had noticeably slower output times, averaging 821 milliseconds across its two generalized variants.</p><p>BiRefNet ran 2.4 times faster on the larger GPU, reducing its average inference time to 351 milliseconds — comparable to the other models, despite its larger size. In contrast, the lighter models did not show any notable speed gain with additional compute, suggesting that scaling hardware configurations primarily benefits heavier models. In <a href="https://blog.cloudflare.com/background-removal/#appendix-1-inference-time-in-milliseconds">Appendix 1</a> (“Inference Time in Milliseconds”), we compare speed across models and GPU instances.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/55Tk0RjbvoffPVQT85UJQe/ca1f2280768495f3be52425e642fdd25/BLOG-2928_12.png" />
          </figure><p>We also observed distinct patterns when comparing model performance across the two saliency datasets. Most notably, all models ran faster on the Humans dataset, where images of people tend to be single-subject and relatively uniform. The DIS5K dataset, in contrast, includes images with higher complexity — that is, images with more objects, cluttered backgrounds, or multiple objects of varying scales.</p><p>Slower predictions suggest a relationship between visual complexity and the computation needed to identify the important parts of an image. In other words, datasets with simpler, well-separated objects can be analyzed more quickly, while complex scenes require more computation to generate accurate masks.</p><p>Similarly, complexity challenges accuracy as much as it does efficiency. In our tests, all models demonstrated higher segmentation accuracy with the Humans dataset. In <a href="https://blog.cloudflare.com/background-removal/#appendix-2-measures-of-model-accuracy">Appendix 2</a> (“Measures of Model Accuracy”), we present our results for segmentation accuracy across both datasets.</p><p>Specialized variants scored slightly higher in accuracy compared to their generalized counterparts. But in broad, practical applications, selecting a specialized model for every input isn’t realistic, at least for our initial beta version. We favored general-purpose models that can produce accurate predictions without prior classification. For this reason, we excluded SAM — while powerful in its intended use cases, SAM is designed to work with additional inputs. On unprompted segmentation tasks, it produced lower accuracy scores (and much higher inference times) amongst the models we tested.</p><p>All BiRefNet variants showed greater accuracy compared to other models. The generalized variants (<code>-genera</code>l and <code>-dis</code>) were just as accurate as its more specialized variants like <code>-portrait</code>. The <code>birefnet-general</code> variant, in particular, achieved a high IoU score of 0.87 and Dice coefficient of 0.92, averaged across both datasets.</p><p>In contrast, the generalized U<sup>2</sup>-Net model showed high accuracy on the Humans dataset, reaching an IoU score of 0.89 and a Dice coefficient of 0.94, but received a low IoU score of 0.39 and Dice coefficient of 0.52 on the DIS5K dataset. The <code>isnet-general-use</code> model performed substantially better, obtaining an average IoU score of 0.82 and Dice coefficient of 0.89 across both datasets.</p><p>We observed whether models could interpret both the global and local context of an image. In some scenarios, the U<sup>2</sup>-Net and Is-Net models captured the overall gist of an image, but couldn’t accurately trace fine edges. We designed one test around measuring how well each model could isolate bicycle wheels; for variety, we included images across both interior and exterior backgrounds. Lower scoring models, while correctly labeling the area surrounding the wheel, struggled with the pixels between the thin spokes and produced prediction masks that included these background pixels.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6mzRTqXhZRk0GuzwuIRu4p/b251aa4f3dbeecc11dbba931623607e5/BLOG-2928_13.png" />
          </figure><p><sub>Photograph by </sub><a href="https://unsplash.com/photos/person-near-bike-p6OU_gENRL0"><sub><u>Yomex Owo on Unsplash</u></sub></a><sub></sub></p><p>In other scenarios, the models showed the opposite limitation: they produced masks with clean edges, but failed to identify the focus of the image. We ran another test using a photograph of a gray T-shirt against black gym flooring. Both generalized U<sup>2</sup>-Net and Is-Net models labeled only the logo as the salient object, creating a mask that omitted the rest of the shirt entirely. </p><p>Meanwhile, the BiRefNet model achieved high accuracy across both types of tests. Its architecture passes information bidirectionally, allowing details at the pixel level to be informed by the larger scene (and vice versa). In practice, this means that BiRefNet interprets how fine-grained edges fit into the broader object. For our beta version, we opted to use the BiRefNet model to drive decisions for background removal.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/741GSfhMn8MPykb6NkWUJV/1ef5006aea8f67a4faeec73862d97ced/BLOG-2928_14.png" />
          </figure><p><sub>Unlike lower scoring models, the BiRefNet model understood that the entire shirt is the true subject of the image.</sub></p>
    <div>
      <h2>Applying background removal with the Images API</h2>
      <a href="#applying-background-removal-with-the-images-api">
        
      </a>
    </div>
    <p>The Images API now supports <a href="https://developers.cloudflare.com/images/transform-images/transform-via-url/#segment">automatic background removal</a> for <a href="https://developers.cloudflare.com/images/upload-images/"><u>hosted</u></a> and <a href="https://developers.cloudflare.com/images/transform-images/"><u>remote</u></a> images. This feature is available in open beta to all Cloudflare users on <a href="https://developers.cloudflare.com/images/pricing/"><u>Free and Paid plans</u></a>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3iglNDllwEMvg6ygDvTRNc/a354422efd166cb3b48ee10995e78aa4/unnamed__25_.png" />
          </figure><p>Use the <code>segment</code> parameter when optimizing an image through a <a href="https://developers.cloudflare.com/images/transform-images/transform-via-url/"><u>specially-formatted Images URL</u></a> or a <a href="https://developers.cloudflare.com/images/transform-images/transform-via-workers/"><u>worker</u></a>, and Cloudflare will isolate the subject of your image and convert the background into transparent pixels. This can be combined with <a href="https://developers.cloudflare.com/images/transform-images/transform-via-url/"><u>other optimization operations</u></a>, as shown in the transformation URL below: </p>
            <pre><code>example.com/cdn-cgi/image/gravity=face,zoom=0.5,segment=foreground,background=white/image.png</code></pre>
            <p>This request will:</p><ul><li><p>Crop the image toward the <a href="https://developers.cloudflare.com/images/transform-images/transform-via-url/#gravity"><u>detected face</u></a>.</p></li><li><p>Isolate the subject in the image, replacing the background with transparent pixels.</p></li><li><p><a href="https://developers.cloudflare.com/images/transform-images/transform-via-url/#background"><u>Fill the transparent pixels</u></a> with a solid white color (<code>#FFFFFF</code>).</p></li></ul><p>You can also <a href="https://developers.cloudflare.com/images/transform-images/bindings/"><u>bind the Images API</u></a> to your worker to build programmatic workflows that give more fine-grained control over how images will be optimized. To demonstrate how this works, I made a <a href="https://studio.yaydeanna.workers.dev/"><u>simple image editing app</u></a> for creating cutouts and overlays, built entirely on Images and <a href="https://developers.cloudflare.com/workers/"><u>Workers</u></a>. This can be used to create images <a href="https://studio.yaydeanna.workers.dev/?order=0%2C1%2C2&amp;i0=icecream&amp;vertEdge0=bottom&amp;vertVal0=0&amp;horEdge0=left&amp;h0=400&amp;bg0=1&amp;i1=pete&amp;vertEdge1=top&amp;horEdge1=left&amp;h1=700&amp;bg1=1&amp;i2=iceland&amp;vertEdge2=top&amp;horEdge2=left"><u>like the one below</u></a>. Here, we apply background removal to isolate the dog and ice cream cone, then overlay them on a landscape image.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6Z6t9ov1t3fbbQojbYbGDh/961cef0f06780bfd8c088772a7add796/image11.png" />
          </figure><p><sub>Photographs by </sub><a href="https://www.pexels.com/@guyjoben/"><sub><u>Guy Hurst</u></sub></a><sub> (landscape), </sub><a href="https://www.pexels.com/@oskar-gackowski-2150870625/"><sub><u>Oskar Gackowski</u></sub></a><sub> (ice cream), and me (dog)</sub></p><p>Here is a snippet that you can use to overlay images in a worker:</p>
            <pre><code>export default {
  async fetch(request,env) {
    const baseURL = "{image-url}";
    const overlayURL = "{image-url}";
    
    // Fetch responses from image URLs
    const [base, overlay] = await Promise.all([fetch(baseURL),fetch(overlayURL)]);

    return (
      await env.IMAGES
        .input(base.body)
        .draw(
          env.IMAGES.input(overlay.body)
            .transform({segment: "foreground"}), // Optimize the overlay image
            {top: 0} // Position the overlay
        )
        .output({format:"image/webp"})
    ).response();
  }
};</code></pre>
            <p>Background removal is another step in our ongoing effort to enable developers to build interactive and imaginative products. These features are an iterative process, and we’ll continue to refine our approach even further. We’re looking forward to sharing our progress with you.</p><p>Read more about applying background removal in our <a href="https://developers.cloudflare.com/images/transform-images/transform-via-url/#segment"><u>documentation</u></a>.</p>
    <div>
      <h3>Appendix 1: Inference Time in Milliseconds</h3>
      <a href="#appendix-1-inference-time-in-milliseconds">
        
      </a>
    </div>
    
    <div>
      <h4>23 GB VRAM GPU</h4>
      <a href="#23-gb-vram-gpu">
        
      </a>
    </div>
    
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2e97UAIgglJ3kP3ozm8lZT/6a44de14aa5179071eb7bbb3c8f31feb/BLOG-2928_17.png" />
          </figure>
    <div>
      <h4>94 GB VRAM GPU</h4>
      <a href="#94-gb-vram-gpu">
        
      </a>
    </div>
    
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2viOyCtbzsloUAvY8kXPJV/378feb50a1dd822d7c848133fbac6a3f/BLOG-2928_18.png" />
          </figure>
    <div>
      <h3>Appendix 2: Measures of Model Accuracy</h3>
      <a href="#appendix-2-measures-of-model-accuracy">
        
      </a>
    </div>
    
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2G9hwnFrlT4eF2isWyaEjk/d3418df56dff686c27f46d96fc86c37f/BLOG-2928_19.png" />
          </figure><p></p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Image Optimization]]></category>
            <category><![CDATA[Cloudflare Images]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <guid isPermaLink="false">q17H7D8gSkyNAPELuTHl9</guid>
            <dc:creator>Deanna Lam</dc:creator>
            <dc:creator>Diretnan Domnan</dc:creator>
        </item>
        <item>
            <title><![CDATA[The age of agents: cryptographically recognizing agent traffic]]></title>
            <link>https://blog.cloudflare.com/signed-agents/</link>
            <pubDate>Thu, 28 Aug 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare now lets websites and bot creators use Web Bot Auth to segment agents from verified bots, making it easier for customers to allow or disallow the many types of user and partner directed. ]]></description>
            <content:encoded><![CDATA[ <p>On the surface, the goal of handling bot traffic is clear: keep malicious bots away, while letting through the helpful ones. Some bots are evidently malicious — such as mass price scrapers or those testing stolen credit cards. Others are helpful, like the bots that index your website. Cloudflare has segmented this second category of helpful bot traffic through our <a href="https://developers.cloudflare.com/bots/concepts/bot/#verified-bots"><u>verified bots</u></a> program, <a href="https://developers.cloudflare.com/bots/concepts/bot/verified-bots/policy/"><u>vetting</u></a> and validating bots that are transparent about who they are and what they do.</p><p>Today, the rise of <a href="https://agents.cloudflare.com/"><u>agents</u></a> has transformed how we interact with the Internet, often blurring the distinctions between benign and malicious bot actors. Bots are no longer directed only by the bot owners, but also by individual end users to act on their behalf. These bots directed by end users are often working in ways that website owners want to allow, such as planning a trip, ordering food, or making a purchase.</p><p>Our customers have asked us for easier, more granular ways to ensure specific <a href="https://www.cloudflare.com/learning/bots/what-is-a-bot/"><u>bots</u></a>, <a href="https://www.cloudflare.com/learning/bots/what-is-a-web-crawler/"><u>crawlers</u></a>, and <a href="https://www.cloudflare.com/learning/ai/what-is-agentic-ai/"><u>agents</u></a> can reach their websites, while continuing to block bad actors. That’s why we’re excited to introduce <b>signed agents</b>, an extension of our verified bots program that gives a new bot classification in our security rules and in Radar. Cloudflare has long recognized agents — but we’re now endowing them with their own classification to make it even easier for our customers to set the traffic lanes they want for their website. </p>
    <div>
      <h2>The age of agents</h2>
      <a href="#the-age-of-agents">
        
      </a>
    </div>
    <p>Cloudflare has continuously expanded our verified bot categorization to include different functions as the market has evolved. For instance, we first announced our grouping of <a href="https://blog.cloudflare.com/ai-bots/"><u>AI crawler traffic as an official bot category</u></a> in 2023. And in 2024, when OpenAI announced a <a href="https://openai.com/index/searchgpt-prototype/"><u>new AI search prototype</u></a> and introduced <a href="https://platform.openai.com/docs/bots"><u>three different bots</u></a> with distinct purposes, we <a href="https://blog.cloudflare.com/cloudflare-ai-audit-control-ai-content-crawlers/"><u>added three new categories</u></a> to account for this innovation: AI Search, AI Assistant, and Archiver.</p><p>But the bot landscape is constantly evolving. Let's unpack a common type of verified AI bot — an AI crawler such as <a href="https://radar.cloudflare.com/bots/directory/gptbot"><u>GPTBot</u></a>. Even though the bot performs an array of tasks, the bot’s ultimate purpose is a singular, repetitive task on behalf of the operator of that bot: fetch and index information. Its intelligence is applied to performing that singular job on behalf of that bot owner. </p><p>Agents, though, are different. Think about an AI agent tasked by a user to "Book the best deal for a round-trip flight to New York City next month." These agents sometimes use remote browsing products like Cloudflare's <a href="https://developers.cloudflare.com/browser-rendering/"><u>Browser Rendering</u></a> and similar products from companies like Browserbase and Anchor Browser. And here is the key distinction: this particular type of bot isn’t operating on behalf of a single company, like OpenAI in the prior example, but rather the end users themselves. </p>
    <div>
      <h2>Introducing signed agents</h2>
      <a href="#introducing-signed-agents">
        
      </a>
    </div>
    <p>In May, we announced Web Bot Auth, a new method of <a href="https://blog.cloudflare.com/web-bot-auth/"><u>using cryptography to verify bot and agent traffic</u></a>. HTTP message signatures allow bots to authenticate themselves and allow customer origins to identify them. This is one of the authentication methods we use today for our verified bots program. </p><p>What, exactly, is a <a href="https://developers.cloudflare.com/bots/concepts/bot/signed-agents/"><u>signed agent</u></a>? First, they are agents that are generally directed by an end user instead of a single company or entity. Second, the infrastructure or remote browsing platform the agents use is signing their HTTP requests via Web Both Auth, with Cloudflare validating these message signatures. And last, they comply with our <a href="https://developers.cloudflare.com/bots/concepts/bot/signed-agents/policy/"><u>signed agent policy</u></a>.</p><p>The signed agents classification improves on our existing frameworks in a couple of ways:</p><ol><li><p><b>Increased precision and visibility:</b> we’ve updated the <i>Cloudflare bots and agents directory to include signed agents</i> in addition to verified bots. This allows us to verify the cryptographic signatures of a much wider set of automated traffic, and our customers to granularly apply their security preferences more easily. Bot operators can now <i>submit signed agent applications from the Cloudflare dashboard</i>, allowing bot owners to specify to us how they think we should segment their automated traffic. </p></li><li><p><b>Easier controls from security rules</b>: similar to how they can take action on verified bots as a group, our Enterprise customers will be able to take action on <i>signed agents as a group when configuring their security rules</i>. This new field will be available in the Cloudflare dashboard under security rules soon.</p></li></ol><p>To apply to have an agent added to Cloudflare’s directory of bots and agents, customers should complete the <a href="https://dash.cloudflare.com?to=/:account/configurations/bot-submission-form"><u>Bot Submission Form</u></a> in the Cloudflare dashboard. Here, they can specify whether the submission should be considered for the signed agents list or the verified bots list. All signed agents will be recognized by their cryptographic signatures through <a href="https://datatracker.ietf.org/doc/html/draft-meunier-web-bot-auth-architecture"><u>Web Bot Auth validation</u></a>. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5caeGdhlmI3dO3GNZKeEUg/0dac239a94732404861b3876f6bdb8b6/BLOG-2930_2.png" />
          </figure><p><sub>The Bot Submission Form, available in the Cloudflare dashboard for bot owners to submit both verified bot and signed agent applications.</sub></p><p>We want to be clear: our verified bots program isn’t going anywhere. In fact, well-behaved and transparent applications that make use of signed agents can further qualify to be a verified bot, if their specific service adheres to our <a href="https://developers.cloudflare.com/bots/concepts/bot/verified-bots/policy/"><u>policy</u></a>. For instance,<a href="https://radar.cloudflare.com/scan"> <u>Cloudflare Radar's URL Scanner</u></a>, which relies on Browser Rendering as a service to scan URLs, is a <a href="https://radar.cloudflare.com/bots/directory/cloudflare-radar-url-scanner"><u>verified bot</u></a>. While Browser Rendering itself does not qualify to be a verified bot, URL Scanner does, since the bot owner (in this case, Cloudflare Radar) directs the traffic sent by the bot and always identifies itself with a unique Web Bot Auth signature — distinct from <a href="https://developers.cloudflare.com/browser-rendering/reference/automatic-request-headers/"><u>Browser Rendering’s signature</u></a>. </p>
    <div>
      <h2>From an agent’s perspective… </h2>
      <a href="#from-an-agents-perspective">
        
      </a>
    </div>
    <p>Since the launch of Web Bot Auth, our own Browser Rendering product has been sending signed Web Bot Auth HTTP headers, and is always given a bot score of 1 for our Bot Management customers. As of today, Browser Rendering will now show up in this new signed agent category. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1F8Z0E6WqJTxLf9G3PLB3a/84e80539be402066fe02ab60c431100a/BLOG-2930_3.png" />
          </figure><p>We’re also excited to announce the first cohort of agents that we’re partnering with and will be classifying as signed agents: <a href="https://openai.com/index/introducing-chatgpt-agent/"><u>ChatGPT agent</u></a>, <a href="https://block.xyz/inside/block-open-source-introduces-codename-goose"><u>Goose</u></a> from Block, <a href="https://docs.browserbase.com/introduction/what-is-browserbase"><u>Browserbase</u></a>, and <a href="https://anchorbrowser.io/"><u>Anchor Browser</u></a>. They are perfect examples of this new classification because their remote browsers are used by their end customers, not necessarily the companies themselves. We’re thrilled to partner with these teams to take this critical step for the AI ecosystem:</p><blockquote><p>“<i>When we built Goose as an open source tool, we designed it to run locally with an extensible architecture that lets developers automate complex workflows. As Goose has evolved to interact with external services and third-party sites on users' behalf, Web Bot Auth enables those sites to trust Goose while preserving what makes it unique. </i><b><i>This authentication breakthrough unlocks entirely new possibilities for autonomous agents</i></b>." – <b>Douwe Osinga</b>, Staff Software Engineer, Block</p></blockquote><blockquote><p><i>"At Browserbase, we provide web browsing capabilities for some of the largest AI applications. We're excited to partner with Cloudflare to support the adoption of Web Bot Auth, a critical layer of identity for agents. </i><b><i>For AI to thrive, agents need reliable, responsible web access.</i></b><i>"</i>  – <b>Paul Klein</b>, CEO, Browserbase</p></blockquote><blockquote><p><i>“Anchor Browser has partnered with Cloudflare to let developers ship verified browser agents. This way </i><b><i>trustworthy bots get reliable access while sites stay protected</i></b><i>.”</i> – <b>Idan Raman</b>, CEO, Anchor Browser</p></blockquote>
    <div>
      <h2>Updated visibility on Radar</h2>
      <a href="#updated-visibility-on-radar">
        
      </a>
    </div>
    <p>We want everyone to be in the know about our bot classifications. Cloudflare began publishing verified bots on our Radar page <a href="https://radar.cloudflare.com/bots#verified-bots"><u>back in 2022</u></a>, meaning anyone on the Internet — Cloudflare customer or not — can see all of our <a href="https://radar.cloudflare.com/bots#verified-bots"><u>verified bots on Radar</u></a>. We dynamically update the list of bots, but show more than just a list: we announced on <a href="https://www.cloudflare.com/en-gb/press-releases/2025/cloudflare-just-changed-how-ai-crawlers-scrape-the-internet-at-large/"><u>Content Independence Day</u></a> that <a href="https://blog.cloudflare.com/ai-search-crawl-refer-ratio-on-radar/#one-more-thing"><u>every verified bot would get its own page</u></a> in our public-facing directory on Radar, which includes the traffic patterns that we see for each bot.</p><p>Our directory has been updated to include <a href="https://radar.cloudflare.com/bots/directory"><b><u>both signed agents and verified bots</u></b></a> — we share exactly how Cloudflare classifies the bots that it recognizes, plus we surface all of the traffic that Cloudflare observes from these many recognized agents and bots. Through this updated directory, we’re not only giving better visibility to our customers, but also striving to set a higher standard for transparency of bot traffic on the Internet. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/65QPFjmbBde3EzHTOwElSL/cccc8f23c37716c251e0c21850855265/BLOG-2930_4.png" />
          </figure><p><sub>Cloudflare Radar’s Bots Directory, which lists verified bots and signed agents. This view is filtered to view only agent entries.</sub></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2wBz7UwrQQzT7rJJnXiF8C/16eed3f1afd95cac32c4bcb647c6e5e6/BLOG-2930_5.png" />
          </figure><p><sub>Cloudflare Radar’s signed agent page for ChatGPT agent, which includes its traffic patterns for the last 7 days, from August 21, 2025 to August 27, 2025. </sub></p>
    <div>
      <h2>What’s now, what’s next</h2>
      <a href="#whats-now-whats-next">
        
      </a>
    </div>
    <p>As of today, the Cloudflare bot directory supports both bots and agents in a more clear-cut way, and customers or agent creators can submit agents to be signed and recognized <a href="https://dash.cloudflare.com/?to=/:account/configurations/bot-submission-form"><u>through their account dashboard</u></a>. In addition, anyone can see our signed agents and their traffic patterns on Radar. Soon, customers will be able to take action on signed agents as a group within their firewall rules, the same way you can take action on our verified bots. </p><p>Agents are changing the way that humans interact with the Internet. Websites need to know what tools are interacting with them, and for the builders of those tools to be able to easily scale. Message signatures help achieve both of these goals, but this is only step one. Cloudflare will continue to make it easier for agents and websites to interact (or not!) at scale, in a seamless way. </p><p>
</p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[AI Bots]]></category>
            <category><![CDATA[Bot Management]]></category>
            <category><![CDATA[Security]]></category>
            <guid isPermaLink="false">1LQFWI1jzZnWAqR4iFMLLi</guid>
            <dc:creator>Jin-Hee Lee</dc:creator>
        </item>
        <item>
            <title><![CDATA[Make Your Website Conversational for People and Agents with NLWeb and AutoRAG]]></title>
            <link>https://blog.cloudflare.com/conversational-search-with-nlweb-and-autorag/</link>
            <pubDate>Thu, 28 Aug 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ With NLWeb, an open project by Microsoft, and Cloudflare AutoRAG, conversational search is now a one-click setup for your website. ]]></description>
            <content:encoded><![CDATA[ <p>Publishers and content creators have historically relied on traditional keyword-based search to help users navigate their website’s content. However, traditional search is built on outdated assumptions: users type in keywords to indicate intent, and the site returns a list of links for the most relevant results. It’s up to the visitor to click around, skim pages, and piece together the answer they’re looking for. </p><p><a href="https://www.cloudflare.com/learning/ai/what-is-artificial-intelligence/"><u>AI</u></a> has reset expectations and that paradigm is breaking: how we search for information has fundamentally changed.</p>
    <div>
      <h2>Your New Type of Visitors</h2>
      <a href="#your-new-type-of-visitors">
        
      </a>
    </div>
    <p>Users no longer want to search websites the old way. They’re used to interacting with AI systems like Copilot, Claude, and ChatGPT, where they can simply ask a question and get an answer. We’ve moved from search engines to answer engines. </p><p>At the same time, websites now have a new class of visitors, AI agents. Agents face the same pain with keyword search: they have to issue keyword queries, click through links, and scrape pages to piece together answers. But they also need more: a structured way to ask questions and get reliable answers across websites. This means that websites need a way to give the agents they trust controlled access, so that information is retrieved accurately.</p><p>Website owners need a way to participate in this shift.</p>
    <div>
      <h2>A New Search Model for the Agentic Web</h2>
      <a href="#a-new-search-model-for-the-agentic-web">
        
      </a>
    </div>
    <p>If AI has reset expectations, what comes next? To meet both people and agents where they are, websites need more than incremental upgrades to keyword search. They need a model that makes conversational access to content a first-class part of the web itself.</p><p>That’s what we want to deliver: combining an open standard (NLWeb) with the infrastructure (AutoRAG) to make it simple for any website to become AI-ready.</p><p><a href="https://news.microsoft.com/source/features/company-news/introducing-nlweb-bringing-conversational-interfaces-directly-to-the-web/"><u>NLWeb</u></a> is an open project developed by Microsoft that defines a standard protocol for natural-language queries on websites. Each NLWeb instance also operates as a Model Context Protocol (MCP) server. Cloudflare is building to this spec and actively working with Microsoft to extend the standard with the goal to let every site function like an AI app, so users and agents alike can query its contents naturally.</p><p><a href="https://developers.cloudflare.com/autorag/"><u>AutoRAG</u></a>, Cloudflare’s managed retrieval engine, can automatically crawl your website, store the content in R2, and embed it into a managed vector database. AutoRAG keeps the index fresh with continuous re-crawling and re-indexing. Model inference and embedding can be served through Workers AI. Each AutoRAG is paired with an AI Gateway that can provide <a href="https://www.cloudflare.com/learning/performance/what-is-observability/">observability and insights</a> into your AI model usage. This gives you a <a href="https://www.cloudflare.com/learning/ai/how-to-build-rag-pipelines/">complete, managed pipeline</a> for conversational search without the burden of managing custom infrastructure.</p><blockquote><p><i>“Together, NLWeb and AutoRAG let publishers go beyond search boxes, making conversational interfaces for websites simple to create and deploy. This integration will enable every website to easily become AI-ready for both people and trusted agents.”</i> – R.V. Guha, creator of NLWeb, CVP and Technical Fellow at Microsoft. </p></blockquote><p>We are optimistic this will open up new monetization models for publishers:</p><blockquote><p><i>"The challenges publishers have faced are well known, as are the risks of AI accelerating the collapse of already challenged business models. However, with NLWeb and AutoRAG, there is an opportunity to reset the nature of relationships with audiences for the better. More direct engagement on Publisher Owned and Operated (O&amp;O) environments, where audiences value the brand and voice of the Publisher, means new potential for monetization. This would be the reset the entire industry needs."</i>  – Joe Marchese, General &amp; Build Partner at Human Ventures.</p></blockquote>
    <div>
      <h2>One-Click to Make Your Site Conversational</h2>
      <a href="#one-click-to-make-your-site-conversational">
        
      </a>
    </div>
    <p>By combining NLWeb's standard with Cloudflare’s AutoRAG infrastructure, we’re making it possible to  easily bring conversational search to any website.</p><p>Simply select your domain in AutoRAG, and it will crawl and index your site for semantic querying. It then deploys a Cloudflare Worker, which acts as the access layer. This Worker implements the NLWeb standard and UI defined by the <a href="https://github.com/nlweb-ai/NLWeb"><u>NLWeb project</u></a> and exposes your indexed content to both people and AI agents.

The Worker includes:</p><ul><li><p><b>`/ask` endpoint:</b> The defined standard for how conversational web searches should be served. Powers the conversational UI at the root `/` as well as the embeddable preview at `/snippet.html`. It supports chat history so queries can build on one another within the same session, and includes automatic query decontextualization to improve retrieval quality.</p></li><li><p><b>`/mcp` endpoint: </b>Implements an MCP server that trusted AI agents can connect to for structured access.</p></li></ul><p>With this setup, your site content is immediately available in two ways for you to experiment: through a conversational UI that you can serve to your visitors, and through a structured MCP interface that lets trusted agents query your site reliably on your terms.</p><p>Additionally, if you prefer to deploy and host your own version of the NLWeb project, there’s also the option to use AutoRAG as the retrieval engine powering the <a href="https://github.com/nlweb-ai/NLWeb/blob/main/docs/setup-cloudflare-autorag.md"><u>NLWeb instance</u></a>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1SM7rSQDhoR4fH5KgAJPD7/2266dc2e3c80f3fcc7f17014eb1d0cf1/image5.png" />
          </figure>
    <div>
      <h2>How Your Site Becomes Conversational</h2>
      <a href="#how-your-site-becomes-conversational">
        
      </a>
    </div>
    
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/xkeREv3GwXwBZw52Dg6XQ/caeb587819d08eff53a33aa893032b78/image2.png" />
          </figure><p>From your perspective, making your site conversational is just a single click. Behind the scenes, AutoRAG spins up a full retrieval pipeline to make that possible:</p><ol><li><p><b>Crawling and ingestion: </b>AutoRAG explores your site like a search engine, following `sitemap.xml` and `robots.txt` files to understand what pages are available and allowed for crawling. From there, it follows your sitemap to discover pages within your domain (up to 100k pages). <a href="https://developers.cloudflare.com/browser-rendering/"><u>Browser Rendering</u></a> is used to load each page so that it can capture dynamic, JavaScript content. Crawled pages are downloaded into an <a href="https://developers.cloudflare.com/r2/"><u>R2 bucket</u></a> in your account before being ingested. </p></li><li><p><b>Continuous Indexing:</b> Once ingested, the content is parsed and embedded into <a href="https://developers.cloudflare.com/vectorize/"><u>Vectorize</u></a>, making it queryable beyond keyword matching through semantic search. AutoRAG automatically re-crawls and re-indexes to keep your knowledge base aligned with your latest content.</p></li><li><p><b>Access &amp; Observability: </b>A Cloudflare Worker is deployed in your account to serve as the access layer that implements the NLWeb protocol (you can also find the deployable Worker in the Workers <a href="https://github.com/cloudflare/templates"><u>templates repository</u></a>). Workers AI is used to seamlessly power the summarization and decontextualized query capabilities to improve responses. <i>Soon, with the</i><a href="http://blog.cloudflare.com/ai-gateway-aug-2025-refresh/"><i><u> AI Gateway and Secret Store BYO keys</u></i></a><i>, you’ll be able to connect models from any provider and select them directly in the AutoRAG dashboard.</i></p></li></ol>
    <div>
      <h2>Road to Making Websites a First-Class Data Source</h2>
      <a href="#road-to-making-websites-a-first-class-data-source">
        
      </a>
    </div>
    <p>Until now, <a href="https://developers.cloudflare.com/autorag/concepts/how-autorag-works/"><u>AutoRAG</u></a> only supported R2 as a data source. That worked well for structured files, but we needed to make a website itself a first-class data source to be indexed and searchable. Making that possible meant building website crawling into AutoRAG and strengthening the system to handle large, dynamic sources like websites.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5ouTCcbipVX3s1fPgg6hEs/541a03efb4365370fee5df67cd68841f/image4.png" />
          </figure><p>Before implementing our web crawler, we needed to improve the reliability of data syncs. Prior users of AutoRAG lacked visibility into when indexing syncs ran and whether they were successful. To fix this, we introduced a Job module to track all syncs, store history, and provide logs. This required two new Durable Objects to be added into AutoRAG’s architecture:</p><ul><li><p><b>JobManager</b> runs a complete sync, and its duties include queuing files, embedding content, and keeping the Vectorize database up to date.  To ensure data consistency, only one JobManager can run per RAG at a time, enforced by the RagManager (a Durable Object in our existing architecture), which cancels any running jobs before starting new ones which can be triggered either manually or by a scheduled sync.</p></li><li><p><b>FileManager</b> solved scalability issues we hit when Workers ran out of memory during parallel processing. Originally, a single Durable Object was responsible for handling multiple files, but with a 128MB memory limit it quickly became a bottleneck. The solution was to break the work apart: JobManager now distributes files across many FileManagers, each responsible for a single file. By processing 20 files in parallel through 20 different FileManagers, we expanded effective memory capacity from 128MB to roughly 2.5GB per batch.</p></li></ul><p>With these improvements, we were ready to build the website parser. By reusing our existing R2-based queuing logic, we added crawling with minimal disruption:</p><ol><li><p>A JobManager designated for a website crawl begins by reading the sitemaps associated with the RAG configuration.</p></li><li><p>Instead of listing objects from an R2 bucket, it queues each website link into our existing R2-based queue, using the full URL as the R2 object key.</p></li><li><p>From here, the process is nearly identical to our file-based sync. A FileManager picks up the job and checks if the RAG is configured for website parsing.</p></li><li><p>If it is, the FileManager crawls the link and places the page's HTML contents into the user's R2 bucket, again using the URL as the object key.</p></li></ol><p>After these steps, we index the data and serve it at query time. This approach maximized code reuse, and any improvements to our <a href="https://blog.cloudflare.com/markdown-for-agents/">HTML-to-Markdown conversion</a> now benefit both file and website-based RAGs automatically.</p>
    <div>
      <h2>Get Started Today</h2>
      <a href="#get-started-today">
        
      </a>
    </div>
    <p>Getting your website ready for conversational search through NLWeb and AutoRAG is simple. Here’s how:</p><ol><li><p>In the <b>Cloudflare Dashboard</b>, navigate to <b>Compute &amp; AI &gt; AutoRAG</b>.</p></li><li><p>Select <b>Create</b> in AutoRAG, then choose the <b>NLWeb Website</b> quick deploy option.</p></li><li><p>Select the <b>domain</b> from your Cloudflare account that you want indexed.</p></li><li><p>Click <b>Start indexing</b>.</p></li></ol><p>That’s it! You can now try out your NLWeb search experience via the provided link, and test out how it will look on your site by using the embeddable snippet.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/dI9xwOKdn3jGkYKWK8NEN/e25ae13199eb09577868e421cc1fef7d/image1.png" />
          </figure><p>We’d love to hear your feedback as you experiment with this new capability and share your thoughts with us at <a>nlweb@cloudflare.com</a>.</p><p></p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Search Engine]]></category>
            <category><![CDATA[Microsoft]]></category>
            <category><![CDATA[Auto Rag]]></category>
            <guid isPermaLink="false">1FRpZMePLmgD9cPqJnMFKS</guid>
            <dc:creator>Catarina Pires Mota</dc:creator>
            <dc:creator>Gabriel Massadas</dc:creator>
            <dc:creator>Nelson Duarte</dc:creator>
            <dc:creator>Daniel Leal</dc:creator>
            <dc:creator>Anni Wang</dc:creator>
        </item>
        <item>
            <title><![CDATA[The next step for content creators in working with AI bots: Introducing AI Crawl Control]]></title>
            <link>https://blog.cloudflare.com/introducing-ai-crawl-control/</link>
            <pubDate>Thu, 28 Aug 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare launches AI Crawl Control (formerly AI Audit) and introduces easily customizable 402 HTTP responses. ]]></description>
            <content:encoded><![CDATA[ <p><i>Empowering content creators in the age of AI with smarter crawling controls and direct communication channels</i></p><p>Imagine you run a regional news site. Last month an AI bot scraped 3 years of archives in minutes — with no payment and little to no referral traffic. As a small company, you may struggle to get the AI company's attention for a licensing deal. Do you block all crawler traffic, or do you let them in and settle for the few referrals they send? </p><p>It’s picking between two bad options.</p><p>Cloudflare wants to help break that stalemate. On July 1st of this year, we declared <a href="https://www.cloudflare.com/press-releases/2025/cloudflare-just-changed-how-ai-crawlers-scrape-the-internet-at-large/"><u>Content Independence Day</u></a> based on a simple premise: creators deserve control of how their content is accessed and used. Today, we're taking the next step in that journey by releasing AI Crawl Control to general availability — giving content creators and AI crawlers an important new way to communicate.</p>
    <div>
      <h2>AI Crawl Control goes GA</h2>
      <a href="#ai-crawl-control-goes-ga">
        
      </a>
    </div>
    <p>Today, we're rebranding our AI Audit tool as <b>AI Crawl Control</b> and moving it from beta to <b>general availability</b>. This reflects the tool's evolution from simple monitoring to detailed insights and <a href="https://www.cloudflare.com/learning/ai/how-to-block-ai-crawlers/">control over how AI systems can access your content</a>. </p><p>The market response has been overwhelming: content creators across industries needed real agency, not just visibility. AI Crawl Control delivers that control.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/pIAbmCR0tTK71umann3w0/e570c5f898e3d399babf6d1f82c2f3d8/image3.png" />
          </figure>
    <div>
      <h2>Using HTTP 402 to help publishers license content to AI crawlers</h2>
      <a href="#using-http-402-to-help-publishers-license-content-to-ai-crawlers">
        
      </a>
    </div>
    <p>Many content creators have faced a binary choice: either they block all AI crawlers and miss potential licensing opportunities and referral traffic; or allow them through without any compensation. Many content creators had no practical way to say "we're open for business, but let's talk terms first."</p><p>Our customers are telling us:</p><ul><li><p>We want to license our content, but crawlers don't know how to reach us. </p></li><li><p>Blanket blocking feels like we're closing doors on potential revenue and referral traffic. </p></li><li><p>We need a way to communicate our terms before crawling begins. </p></li></ul><p>To address these needs, we are making it easier than ever to send customizable<b> </b><a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/402">402 HTTP status codes</a>. </p><p>Our <a href="https://blog.cloudflare.com/introducing-pay-per-crawl/#what-if-i-could-charge-a-crawler"><u>private beta launch of Pay Per Crawl</u></a> put the HTTP 402 (“Payment Required”) response codes to use, working in tandem with Web Bot Auth to enable direct payments between agents and content creators. Today, we’re making customizable 402 response codes available to every paid Cloudflare customer — not just pay per crawl users.</p><p>Here's how it works: in AI Crawl Control, paying Cloudflare customers will be able to select individual bots to block with a configurable message parameter and send 402 payment required responses. Think: "To access this content, email partnerships@yoursite.com or call 1-800-LICENSE" or "Premium content available via API at api.yoursite.com/pricing."</p><p>On an average day, Cloudflare customers are already sending over one billion 402 response codes. This shows a deep desire to move beyond blocking to open communication channels and new monetization models. With the 402 HTTP status code, content creators can tell crawlers exactly how to properly license their content, creating a direct path from crawling to a commercial agreement. We are excited to make this easier than ever in the AI Crawl Control dashboard. </p>
    <div>
      <h2>How to customize your 402 status code with AI Crawl Control: </h2>
      <a href="#how-to-customize-your-402-status-code-with-ai-crawl-control">
        
      </a>
    </div>
    <p><b>For Paid Plan Users:</b></p><ul><li><p>When you block individual crawlers from the AI Crawl Control dashboard, you can now choose to send 402 Payment Required status codes and customize your message. For example: <b>To access this content, email partnerships@yoursite.com or call 1-800-LICENSE</b>.</p></li></ul><p>The response will look like this:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5v5x41azcAK14DBhXjXPEX/8c0960b4bb556d62e88d19c9dd544f12/image4.png" />
          </figure><p>The message can be configured from Settings in the AI Crawl Control Dashboard:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2KMdRYwoey9RdYIxmzmFO1/7b39fd82d43349ee1cc4832cb602eb56/image1.png" />
          </figure>
    <div>
      <h2>Beyond just blocking AI bots</h2>
      <a href="#beyond-just-blocking-ai-bots">
        
      </a>
    </div>
    <p>This is just the beginning. We're planning to add additional parameters that will let crawlers understand the content's value, freshness, and licensing terms directly in the 402 response. Imagine crawlers receiving structured data about content quality and update frequency, for example, in addition to contact information.</p><p>Meanwhile, <a href="https://blog.cloudflare.com/introducing-pay-per-crawl/">pay per crawl</a> continues advancing through beta, giving content creators the infrastructure to automatically monetize crawler access with transparent, usage-based pricing.</p><p>What excites us most is the market shift we're seeing. We're moving to a world where content creators have clear monetization paths to become active participants in the development of rich AI experiences. </p><p>The 402 response is a bridge between two industries that want to work together: content creators whose work fuels AI development, and AI companies who need high-quality data. Cloudflare’s AI Crawl Control creates the infrastructure for these partnerships to flourish.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/31Np3qX2ssbeGaJnZHQodA/92246d3618778715c2e8b295b7acaa29/image5.png" />
          </figure><div>
  
</div><p></p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[Pay Per Crawl]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[AI Bots]]></category>
            <category><![CDATA[Bot Management]]></category>
            <guid isPermaLink="false">3UcNgGUfIUIm0EEtNwgLAT</guid>
            <dc:creator>Will Allen</dc:creator>
            <dc:creator>Pulkita Kini</dc:creator>
            <dc:creator>Cam Whiteside</dc:creator>
        </item>
        <item>
            <title><![CDATA[AI Gateway now gives you access to your favorite AI models, dynamic routing and more — through just one endpoint]]></title>
            <link>https://blog.cloudflare.com/ai-gateway-aug-2025-refresh/</link>
            <pubDate>Wed, 27 Aug 2025 14:05:00 GMT</pubDate>
            <description><![CDATA[ AI Gateway now gives you access to your favorite AI models, dynamic routing and more — through just one endpoint. ]]></description>
            <content:encoded><![CDATA[ <p>Getting the observability you need is challenging enough when the code is deterministic, but AI presents a new challenge — a core part of your user’s experience now relies on a non-deterministic engine that provides unpredictable outputs. On top of that, there are many factors that can influence the results: the model, the system prompt. And on top of that, you still have to worry about performance, reliability, and costs. </p><p>Solving performance, reliability and observability challenges is exactly what Cloudflare was built for, and two years ago, with the introduction of AI Gateway, we wanted to extend to our users the same levels of control in the age of AI. </p><p>Today, we’re excited to announce several features to make building AI applications easier and more manageable: unified billing, secure key storage, dynamic routing, security controls with Data Loss Prevention (DLP). This means that AI Gateway becomes your go-to place to control costs and API keys, route between different models and providers, and manage your AI traffic. Check out our new <a href="https://ai.cloudflare.com/gateway"><u>AI Gateway landing page</u></a> for more information at a glance.</p>
    <div>
      <h2>Connect to all your favorite AI providers</h2>
      <a href="#connect-to-all-your-favorite-ai-providers">
        
      </a>
    </div>
    <p>When using an AI provider, you typically have to sign up for an account, get an API key, manage rate limits, top up credits — all within an individual provider’s dashboard. Multiply that for each of the different providers you might use, and you’ll soon be left with an administrative headache of bills and keys to manage.</p><p>With <a href="https://www.cloudflare.com/developer-platform/products/ai-gateway/"><u>AI Gateway</u></a>, you can now connect to major AI providers directly through Cloudflare and manage everything through one single plane. We’re excited to partner with Anthropic, Google, Groq, OpenAI, and xAI to provide Cloudflare users with access to their models directly through Cloudflare. With this, you’ll have access to over 350+ models across 6 different providers.</p><p>You can now get billed for usage across different providers directly through your Cloudflare account. This feature is available for Workers Paid users, where you’ll be able to add credits to your Cloudflare account and use them for <a href="https://www.cloudflare.com/learning/ai/inference-vs-training/"><u>AI inference</u></a> to all the supported providers. You’ll be able to see real-time usage statistics and manage your credits through the AI Gateway dashboard. Your AI Gateway inference usage will also be documented in your monthly Cloudflare invoice. No more signing up and paying for each individual model provider account. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4t2j5frheaYOLznprTL58p/f0fb4c6de2aad70c82a23bc35873ea50/image1.png" />
          </figure><p>Usage rates are based on then-current list prices from model providers — all you will need to cover is the transaction fee as you load credits into your account. Since this is one of the first times we’re launching a credits based billing system at Cloudflare, we’re releasing this feature in Closed Beta — sign up for access <a href="https://forms.gle/3LGAzN2NDXqtbjKR9"><u>here</u></a>.</p>
    <div>
      <h3>BYO Provider Keys, now with Cloudflare Secrets Store</h3>
      <a href="#byo-provider-keys-now-with-cloudflare-secrets-store">
        
      </a>
    </div>
    <p>Although we’ve introduced unified billing, some users might still want to manage their own accounts and keys with providers. We’re happy to say that AI Gateway will continue supporting our <a href="https://developers.cloudflare.com/ai-gateway/configuration/bring-your-own-keys/"><u>BYO Key feature, </u></a>improving the experience of BYO Provider Keys by integrating with Cloudflare’s secrets management product <a href="https://developers.cloudflare.com/secrets-store/"><u>Secrets Store</u></a>. Now, you can seamlessly and securely store your keys in one centralized location and distribute them without relying on plain text. Secrets Store uses a two level key hierarchy with AES encryption to ensure that your secret stays safe, while maintaining low latency through our global configuration system, <a href="https://blog.cloudflare.com/quicksilver-v2-evolution-of-a-globally-distributed-key-value-store-part-1/"><u>Quicksilver</u></a>.</p><p>You can now save and manage keys directly through your AI Gateway dashboard or through the Secrets Store <a href="http://dash.cloudflare.com/?to=/:account/secrets-store"><u>dashboard</u></a>, <a href="https://developers.cloudflare.com/api/resources/secrets_store/subresources/stores/subresources/secrets/methods/create/"><u>API</u></a>, or <a href="https://developers.cloudflare.com/workers/wrangler/commands/#secrets-store-secret"><u>Wrangler</u></a> by using the new <b>AI Gateway</b> <b>scope</b>. Scoping your secrets to AI Gateway ensures that only this specific service will be able to access your keys, meaning that secret could not be used in a Workers binding or anywhere else on Cloudflare’s platform.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6hiSSQi2lQGWQnGYe4e9p1/dadc4fde865010d9e263badb75847992/2.png" />
          </figure><p>You can pass your AI provider keys without including them directly in the request header. Instead of including the actual value, you can deploy the secret only using the Secrets Store reference: </p>
            <pre><code>curl -X POST https://gateway.ai.cloudflare.com/v1/&lt;ACCOUNT_ID&gt;/my-gateway/anthropic/v1/messages \
 --header 'cf-aig-authorization: CLOUDFLARE_AI_GATEWAY_TOKEN \
 --header 'anthropic-version: 2023-06-01' \
 --header 'Content-Type: application/json' \
 --data  '{"model": "claude-3-opus-20240229", "messages": [{"role": "user", "content": "What is Cloudflare?"}]}'</code></pre>
            <p>Or, using Javascript: </p>
            <pre><code>import Anthropic from '@anthropic-ai/sdk';


const anthropic = new Anthropic({
  apiKey: "CLOUDFLARE_AI_GATEWAY_TOKEN",
  baseURL: "https://gateway.ai.cloudflare.com/v1/&lt;ACCOUNT_ID&gt;/my-gateway/anthropic",
});


const message = await anthropic.messages.create({
  model: 'claude-3-opus-20240229',
  messages: [{role: "user", content: "What is Cloudflare?"}],
  max_tokens: 1024
});</code></pre>
            <p>By using Secrets Store to deploy your secrets, you no longer need to give every developer access to every key — instead, you can rely on Secrets Store’s <a href="https://developers.cloudflare.com/secrets-store/access-control/"><u>role-based access control</u></a> to further lock down these sensitive values. For example, you might want your security administrators to have Secrets Store admin permissions so that they can create, update, and delete the keys when necessary. With Cloudflare <a href="https://developers.cloudflare.com/logs/logpush/logpush-job/datasets/account/audit_logs/?cf_target_id=1C767B900C4419A313C249A5D99921FB"><u>audit logging</u></a>, all such actions will be logged so you know exactly who did what and when. Your developers, on the other hand, might only need Deploy permissions, so they can reference the values in code, whether that is a Worker or AI Gateway or both. This way, you reduce the risk of the secret getting leaked accidentally or intentionally by a malicious actor. This also allows you to update your provider keys in one place and automatically propagate that value to any AI Gateway using those values, simplifying the management. </p>
    <div>
      <h3>Unified Request/Response</h3>
      <a href="#unified-request-response">
        
      </a>
    </div>
    <p>We made it super easy for people to try out different AI models – but the developer experience should match that as well. We found that each provider can have slight differences in how they expect people to send their requests, so we’re excited to launch an automatic translation layer between providers. When you send a request through AI Gateway, it just works – no matter what provider or model you use.</p>
            <pre><code>import OpenAI from "openai";
const client = new OpenAI({
  apiKey: "YOUR_PROVIDER_API_KEY", // Provider API key
  // NOTE: the OpenAI client automatically adds /chat/completions to the end of the URL, you should not add it yourself.
  baseURL:
    "https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/compat",
});

const response = await client.chat.completions.create({
  model: "google-ai-studio/gemini-2.0-flash",
  messages: [{ role: "user", content: "What is Cloudflare?" }],
});

console.log(response.choices[0].message.content);</code></pre>
            
    <div>
      <h2>Dynamic Routes</h2>
      <a href="#dynamic-routes">
        
      </a>
    </div>
    <p>When we first launched <a href="https://www.cloudflare.com/developer-platform/products/workers/"><u>Cloudflare Workers</u></a>, it was an easy way for people to intercept HTTP requests and customize actions based on different attributes. We think the same customization is necessary for AI traffic, so we’re launching <a href="https://developers.cloudflare.com/ai-gateway/features/dynamic-routing/"><u>Dynamic Routes</u></a> in AI Gateway.</p><p>Dynamic Routes allows you to define certain actions based on different request attributes. If you have free users, maybe you want to ratelimit them to a certain request per second (RPS) or a certain dollar spend. Or maybe you want to conduct an A/B test and split 50% of traffic to Model A and 50% of traffic to Model B. You could also want to chain several models in a row, like adding custom guardrails or enhancing a prompt before it goes to another model. All of this is possible with Dynamic Routes!</p><p>We’ve built a slick UI in the AI Gateway dashboard where you can define simple if/else interactions based on request attributes or a percentage split. Once you define a route, you’ll use the route as the “model” name in your input JSON and we will manage the traffic as you defined. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7qLp4KT8ASCLRv2pyM2kxR/3151e32afa4d8447ae07a5a8fb09a9b6/3.png" />
          </figure>
            <pre><code>import OpenAI from "openai";

const cloudflareToken = "CF_AIG_TOKEN";
const accountId = "{account_id}";
const gatewayId = "{gateway_id}";
const baseURL = `https://gateway.ai.cloudflare.com/v1/${accountId}/${gatewayId}`;

const openai = new OpenAI({
  apiKey: cloudflareToken,
  baseURL,
});

try {
  const model = "dynamic/&lt;your-dynamic-route-name&gt;";
  const messages = [{ role: "user", content: "What is a neuron?" }];
  const chatCompletion = await openai.chat.completions.create({
    model,
    messages,
  });
  const response = chatCompletion.choices[0].message;
  console.log(response);
} catch (e) {
  console.error(e);
}</code></pre>
            
    <div>
      <h2>Built-in security with Firewall in AI Gateway</h2>
      <a href="#built-in-security-with-firewall-in-ai-gateway">
        
      </a>
    </div>
    <p>Earlier this year we announced <a href="https://developers.cloudflare.com/changelog/2025-02-26-guardrails/"><u>Guardrails</u></a> in AI Gateway and now we’re expanding our security capabilities and include Data Loss Prevention (DLP) scanning in AI Gateway’s Firewall. With this, you can select the DLP profiles you are interested in blocking or flagging, and we will scan requests for the matching content. DLP profiles include general categories like “Financial Information”, “Social Security, Insurance, Tax and Identifier Numbers” that everyone has access to with a free Zero Trust account. If you would like to create a custom DLP profile to safeguard specific text, the upgraded Zero Trust plan allows you to create custom DLP profiles to catch sensitive data that is unique to your business.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5yti8oy4TF01EdZMtYN1If/d2f3bd804873644862fbd61b07d3574a/4.png" />
          </figure><p>False positives and grey area situations happen, we give admins controls on whether to fully block or just alert on DLP matches. This allows administrators to monitor for potential issues without creating roadblocks for their users.. Each log on AI gateway now includes details about the DLP profiles matched on your request, and the action that was taken:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2pDdqy8bVmsiyjm4sg2pkG/ff97d9069e200fb859c1dc2daed8e4fa/5.png" />
          </figure>
    <div>
      <h2>More coming soon…</h2>
      <a href="#more-coming-soon">
        
      </a>
    </div>
    <p>If you think about the history of Cloudflare, you’ll notice similar patterns that we’re following for the new vision for AI Gateway. We want developers of AI applications to be able to have simple interconnectivity, observability, security, customizable actions, and more — something that Cloudflare has a proven track record of accomplishing for global Internet traffic. We see AI Gateway as a natural extension of Cloudflare’s mission, and we’re excited to make it come to life.</p><p>We’ve got more launches up our sleeves, but we couldn’t wait to get these first handful of features into your hands. Read up about it in our <a href="https://developers.cloudflare.com/ai-gateway/"><u>developer docs</u></a>, <a href="https://developers.cloudflare.com/ai-gateway/get-started/"><u>give it a try</u></a>, and let us know what you think. If you want to explore larger deployments, <a href="https://www.cloudflare.com/plans/enterprise/contact/?utm_medium=referral&amp;utm_source=blog&amp;utm_campaign=2025-q3-acq-gbl-connectivity-ge-ge-general-ai_week_blog"><u>reach out for a consultation </u></a>with Cloudflare experts.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/LTpdSaZMBbdOzASW8ggoS/6610f437d955174d7f7f1212617a4365/6.png" />
          </figure><div>
  
</div><p></p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[AI Gateway]]></category>
            <category><![CDATA[AI]]></category>
            <guid isPermaLink="false">6O1tkxTcxxG9hgxI8X9kFH</guid>
            <dc:creator>Michelle Chen</dc:creator>
            <dc:creator>Abhishek Kankani</dc:creator>
            <dc:creator>Mia Malden</dc:creator>
        </item>
        <item>
            <title><![CDATA[How we built the most efficient inference engine for Cloudflare’s network ]]></title>
            <link>https://blog.cloudflare.com/cloudflares-most-efficient-ai-inference-engine/</link>
            <pubDate>Wed, 27 Aug 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Infire is an LLM inference engine that employs a range of techniques to maximize resource utilization, allowing us to serve AI models more efficiently with better performance for Cloudflare workloads. ]]></description>
            <content:encoded><![CDATA[ <p>Inference powers some of today’s most powerful AI products: chat bot replies, <a href="https://www.cloudflare.com/learning/ai/what-is-agentic-ai/"><u>AI agents</u></a>, autonomous vehicle decisions, and fraud detection. The problem is, if you’re building one of these products on top of a hyperscaler, you’ll likely need to rent expensive GPUs from large centralized data centers to run your inference tasks. That model doesn’t work for Cloudflare — there’s a mismatch between Cloudflare’s globally-distributed network and a typical centralized AI deployment using large multi-GPU nodes. As a company that operates our own compute on a lean, fast, and widely distributed network within 50ms of 95% of the world’s Internet-connected population, we need to be running inference tasks more efficiently than anywhere else.</p><p>This is further compounded by the fact that AI models are getting larger and more complex. As we started to support these models, like the Llama 4 herd and gpt-oss, we realized that we couldn’t just throw money at the scaling problems by buying more GPUs. We needed to utilize every bit of idle capacity and be agile with where each model is deployed. </p><p>After running most of our models on the widely used open source inference and serving engine <a href="https://github.com/vllm-project/vllm"><u>vLLM</u></a>, we figured out it didn’t allow us to fully utilize the GPUs at the edge. Although it can run on a very wide range of hardware, from personal devices to data centers, it is best optimized for large data centers. When run as a dedicated inference server on powerful hardware serving a specific model, vLLM truly shines. However, it is much less optimized for dynamic workloads, distributed networks, and for the unique security constraints of running inference at the edge alongside other services.</p><p>That’s why we decided to build something that will be able to meet the needs of Cloudflare inference workloads for years to come. Infire is an LLM inference engine, written in Rust, that employs a range of techniques to maximize memory, network I/O, and GPU utilization. It can serve more requests with fewer GPUs and significantly lower CPU overhead, saving time, resources, and energy across our network. </p><p>Our initial benchmarking has shown that Infire completes inference tasks up to 7% faster than vLLM 0.10.0 on unloaded machines equipped with an H100 NVL GPU. On infrastructure under real load, it performs significantly better. </p><p>Currently, Infire is powering the Llama 3.1 8B model for <a href="https://developers.cloudflare.com/workers-ai/"><u>Workers AI</u></a>, and you can test it out today at <a href="https://developers.cloudflare.com/workers-ai/models/llama-3.1-8b-instruct-fast/"><u>@cf/meta/llama-3.1-8b-instruct</u></a>!</p>
    <div>
      <h2>The Architectural Challenge of LLM Inference at Cloudflare </h2>
      <a href="#the-architectural-challenge-of-llm-inference-at-cloudflare">
        
      </a>
    </div>
    <p>Thanks to industry efforts, inference has improved a lot over the past few years. vLLM has led the way here with the recent release of the vLLM V1 engine with features like an optimized KV cache, improved batching, and the implementation of Flash Attention 3. vLLM is great for most inference workloads — we’re currently using it for several of the models in our <a href="https://developers.cloudflare.com/workers-ai/models/"><u>Workers AI catalog</u></a> — but as our AI workloads and catalog has grown, so has our need to optimize inference for the exact hardware and performance requirements we have. </p><p>Cloudflare is writing much of our <a href="https://blog.cloudflare.com/rust-nginx-module/"><u>new infrastructure in Rust</u></a>, and vLLM is written in Python. Although Python has proven to be a great language for prototyping ML workloads, to maximize efficiency we need to control the low-level implementation details. Implementing low-level optimizations through multiple abstraction layers and Python libraries adds unnecessary complexity and leaves a lot of CPU performance on the table, simply due to the inefficiencies of Python as an interpreted language.</p><p>We love to contribute to open-source projects that we use, but in this case our priorities may not fit the goals of the vLLM project, so we chose to write a server for our needs. For example, vLLM does not support co-hosting multiple models on the same GPU without using Multi-Instance GPU (MIG), and we need to be able to dynamically schedule multiple models on the same GPU to minimize downtime. We also have an in-house AI Research team exploring unique features that are difficult, if not impossible, to upstream to vLLM. </p><p>Finally, running code securely is our top priority across our platform and <a href="https://www.cloudflare.com/developer-platform/products/workers-ai/"><u>Workers AI</u></a> is no exception. We simply can’t trust a 3rd party Python process to run on our edge nodes alongside the rest of our services without strong sandboxing. We are therefore forced to run vLLM via <a href="https://gvisor.dev"><u>gvisor</u></a>. Having an extra virtualization layer adds another performance overhead to vLLM. More importantly, it also increases the startup and tear downtime for vLLM instances — which are already pretty long. Under full load on our edge nodes, vLLM running via gvisor consumes as much as 2.5 CPU cores, and is forced to compete for CPU time with other crucial services, that in turn slows vLLM down and lowers GPU utilization as a result.</p><p>While developing Infire, we’ve been incorporating the latest research in inference efficiency — let’s take a deeper look at what we actually built.</p>
    <div>
      <h2>How Infire works under the hood </h2>
      <a href="#how-infire-works-under-the-hood">
        
      </a>
    </div>
    <p>Infire is composed of three major components: an OpenAI compatible HTTP server, a batcher, and the Infire engine itself.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3BypYSG9QFsPjPFhjlOEsa/6ef5d4ccaabcd96da03116b7a14e8439/image2.png" />
          </figure><p><i><sup>An overview of Infire’s architecture </sup></i></p>
    <div>
      <h2>Platform startup</h2>
      <a href="#platform-startup">
        
      </a>
    </div>
    <p>When a model is first scheduled to run on a specific node in one of our data centers by our auto-scaling service, the first thing that has to happen is for the model weights to be fetched from our <a href="https://www.cloudflare.com/developer-platform/products/r2/"><u>R2 object storage</u></a>. Once the weights are downloaded, they are cached on the edge node for future reuse.</p><p>As the weights become available either from cache or from R2, Infire can begin loading the model onto the GPU. </p><p>Model sizes vary greatly, but most of them are <b>large, </b>so transferring them into GPU memory can be a time-consuming part of Infire’s startup process. For example, most non-quantized models store their weights in the BF16 floating point format. This format has the same dynamic range as the 32-bit floating format, but with reduced accuracy. It is perfectly suited for inference providing the sweet spot of size, performance and accuracy. As the name suggests, the BF16 format requires 16 bits, or 2 bytes per weight. The approximate in-memory size of a given model is therefore double the size of its parameters. For example, LLama3.1 8B has approximately 8B parameters, and its memory footprint is about 16 GB. A larger model, like LLama4 Scout, has 109B parameters, and requires around 218 GB of memory. Infire utilizes a combination of <a href="https://developer.nvidia.com/blog/how-optimize-data-transfers-cuda-cc/#pinned_host_memory"><u>Page Locked</u></a> memory with CUDA asynchronous copy mechanism over multiple streams to speed up model transfer into GPU memory.</p><p>While loading the model weights, Infire begins just-in-time compiling the required kernels based on the model's parameters, and loads them onto the device. Parallelizing the compilation with model loading amortizes the latency of both processes. The startup time of Infire when loading the Llama-3-8B-Instruct model from disk is just under 4 seconds. </p>
    <div>
      <h3>The HTTP server</h3>
      <a href="#the-http-server">
        
      </a>
    </div>
    <p>The Infire server is built on top of <a href="https://docs.rs/hyper/latest/hyper/"><u>hyper</u></a>, a high performance HTTP crate, which makes it possible to handle hundreds of connections in parallel – while consuming a modest amount of CPU time. Because of ChatGPT’s ubiquity, vLLM and many other services offer OpenAI compatible endpoints out of the box. Infire is no different in that regard. The server is responsible for handling communication with the client: accepting connections, handling prompts and returning responses. A prompt will usually consist of some text, or a "transcript" of a chat session along with extra parameters that affect how the response is generated. Some parameters that come with a prompt include the temperature, which affects the randomness of the response, as well as other parameters that affect the randomness and length of a possible response.</p><p>After a request is deemed valid, Infire will pass it to the tokenizer, which transforms the raw text into a series of tokens, or numbers that the model can consume. Different models use different kinds of tokenizers, but the most popular ones use byte-pair encoding. For tokenization, we use HuggingFace's tokenizers crate. The tokenized prompts and params are then sent to the batcher, and scheduled for processing on the GPU, where they will be processed as vectors of numbers, called <a href="https://www.cloudflare.com/learning/ai/what-are-embeddings/"><u>embeddings</u></a>.</p>
    <div>
      <h2>The batcher</h2>
      <a href="#the-batcher">
        
      </a>
    </div>
    <p>The most important part of Infire is in how it does batching: by executing multiple requests in parallel. This makes it possible to better utilize memory bandwidth and caches. </p><p>In order to understand why batching is so important, we need to understand how the inference algorithm works. The weights of a model are essentially a bunch of two-dimensional matrices (also called tensors). The prompt represented as vectors is passed through a series of transformations that are largely dominated by one operation: vector-by-matrix multiplication. The model weights are so large, that the cost of the multiplication is dominated by the time it takes to fetch it from memory. In addition, modern GPUs have hardware units dedicated to matrix-by-matrix multiplications (called Tensor Cores on Nvidia GPUs). In order to amortize the cost of memory access and take advantage of the Tensor Cores, it is necessary to aggregate multiple operations into a larger matrix multiplication.</p><p>Infire utilizes two techniques to increase the size of those matrix operations. The first one is called prefill: this technique is applied to the prompt tokens. Because all the prompt tokens are available in advance and do not require decoding, they can all be processed in parallel. This is one reason why input tokens are often cheaper (and faster) than output tokens.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1pqyNSzgWLcgrV3urpCvA0/e204ac477992d591a7368632c36e97eb/image1.png" />
          </figure><p><sup><i>How Infire enables larger matrix multiplications via batching</i></sup></p><p>The other technique is called batching: this technique aggregates multiple prompts into a single decode operation.</p><p>Infire mixes both techniques. It attempts to process as many prompts as possible in parallel, and fills the remaining slots in a batch with prefill tokens from incoming prompts. This is also known as continuous batching with chunked prefill.</p><p>As tokens get decoded by the Infire engine, the batcher is also responsible for retiring prompts that reach an End of Stream token, and sending tokens back to the decoder to be converted into text. </p><p>Another job the batcher has is handling the KV cache. One demanding operation in the inference process is called <i>attention</i>. Attention requires going over the KV values computed for all the tokens up to the current one. If we had to recompute those previously encountered KV values for every new token we decode, the runtime of the process would explode for longer context sizes. However, using a cache, we can store all the previous values and re-read them for each consecutive token. Potentially the KV cache for a prompt can store KV values for as many tokens as the context window allows. In LLama 3, the maximal context window is 128K tokens. If we pre-allocated the KV cache for each prompt in advance, we would only have enough memory available to execute 4 prompts in parallel on H100 GPUs! The solution for this is paged KV cache. With paged KV caching, the cache is split into smaller chunks called pages. When the batcher detects that a prompt would exceed its KV cache, it simply assigns another page to that prompt. Since most prompts rarely hit the maximum context window, this technique allows for essentially unlimited parallelism under typical load.</p><p>Finally, the batcher drives the Infire forward pass by scheduling the needed kernels to run on the GPU.</p>
    <div>
      <h2>CUDA kernels</h2>
      <a href="#cuda-kernels">
        
      </a>
    </div>
    <p>Developing Infire gives us the luxury of focusing on the exact hardware we use, which is currently Nvidia Hopper GPUs. This allowed us to improve performance of specific compute kernels using low-level PTX instructions for this specific architecture.</p><p>Infire just-in-time compiles its kernel for the specific model it is running, optimizing for the model’s parameters, such as the hidden state size, dictionary size and the GPU it is running on. For some operations, such as large matrix multiplications, Infire will utilize the high performance cuBLASlt library, if it would deem it faster.</p><p>Infire also makes use of very fine-grained CUDA graphs, essentially creating a dedicated CUDA graph for every possible batch size on demand. It then stores it for future launch. Conceptually, a CUDA graph is another form of just-in-time compilation: the CUDA driver replaces a series of kernel launches with a single construct (the graph) that has a significantly lower amortized kernel launch cost, thus kernels executed back to back will execute faster when launched as a single graph as opposed to individual launches.</p>
    <div>
      <h2>How Infire performs in the wild </h2>
      <a href="#how-infire-performs-in-the-wild">
        
      </a>
    </div>
    <p>We ran synthetic benchmarks on one of our edge nodes with an H100 NVL GPU.</p><p>The benchmark we ran was on the widely used ShareGPT v3 dataset. We ran the benchmark on a set of 4,000 prompts with a concurrency of 200. We then compared Infire and vLLM running on bare metal as well as vLLM running under gvisor, which is the way we currently run in production. In a production traffic scenario, an edge node would be competing for resources with other traffic. To simulate this, we benchmarked vLLM running in gvisor with only one CPU available.</p><table><tr><td><p>
</p></td><td><p>requests/s</p></td><td><p>tokens/s</p></td><td><p>CPU load</p></td></tr><tr><td><p>Infire</p></td><td><p>40.91</p></td><td><p>17224.21</p></td><td><p>25%</p></td></tr><tr><td><p>vLLM 0.10.0</p></td><td><p>38.38</p></td><td><p>16164.41</p></td><td><p>140%</p></td></tr><tr><td><p>vLLM under gvisor</p></td><td><p>37.13</p></td><td><p>15637.32</p></td><td><p>250%</p></td></tr><tr><td><p>vLLM under gvisor with CPU constraints</p></td><td><p>22.04</p></td><td><p>9279.25</p></td><td><p>100%</p></td></tr></table><p>As evident from the benchmarks we achieved our initial goal of matching and even slightly surpassing vLLM performance, but more importantly, we’ve done so at a significantly lower CPU usage, in large part because we can run Infire as a trusted bare-metal process. Inference no longer takes away precious resources from our other services and we see GPU utilization upward of 80%, reducing our operational costs.</p><p>This is just the beginning. There are still multiple proven performance optimizations yet to be implemented in Infire – for example, we’re integrating Flash Attention 3, and most of our kernels don’t utilize kernel fusion. Those and other optimizations will allow us to unlock even faster inference in the near future.</p>
    <div>
      <h2>What’s next </h2>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>Running AI inference presents novel challenges and demands to our infrastructure. Infire is how we’re running AI efficiently — close to users around the world. By building upon techniques like continuous batching, a paged KV-cache, and low-level optimizations tailored to our hardware, Infire maximizes GPU utilization while minimizing overhead. Infire completes inference tasks faster and with a fraction of the CPU load of our previous vLLM-based setup, especially under the strict security constraints we require. This allows us to serve more requests with fewer resources, making requests served via Workers AI faster and more efficient.</p><p>However, this is just our first iteration — we’re excited to build in multi-GPU support for larger models, quantization, and true multi-tenancy into the next version of Infire. This is part of our goal to make Cloudflare the best possible platform for developers to build AI applications.</p><p>Want to see if your AI workloads are faster on Cloudflare? <a href="https://developers.cloudflare.com/workers-ai/"><u>Get started</u></a> with Workers AI today. </p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[LLM]]></category>
            <category><![CDATA[Workers AI]]></category>
            <guid isPermaLink="false">7Li4fkq9b4B8QlgwSmZrqE</guid>
            <dc:creator>Vlad Krasnov</dc:creator>
            <dc:creator>Mari Galicer</dc:creator>
        </item>
        <item>
            <title><![CDATA[State-of-the-art image generation Leonardo models and text-to-speech Deepgram models now available in Workers AI]]></title>
            <link>https://blog.cloudflare.com/workers-ai-partner-models/</link>
            <pubDate>Wed, 27 Aug 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ We're expanding Workers AI with new partner models from Leonardo.Ai and Deepgram. Start using state-of-the-art image generation models from Leonardo and real-time TTS and STT models from Deepgram.  ]]></description>
            <content:encoded><![CDATA[ <p>When we first launched <a href="https://www.cloudflare.com/developer-platform/products/workers-ai/"><u>Workers AI</u></a>, we made a bet that AI models would get faster and smaller. We built our infrastructure around this hypothesis, adding specialized GPUs to our datacenters around the world that can serve inference to users as fast as possible. We created our platform to be as general as possible, but we also identified niche use cases that fit our infrastructure well, such as low-latency image generation or real-time audio voice agents. To lean in on those use cases, we’re bringing on some new models that will help make it easier to develop for these applications.</p><p>Today, we’re excited to announce that we are expanding our model catalog to include closed-source partner models that fit this use case. We’ve partnered with <a href="http://leonardo.ai"><u>Leonardo.Ai</u></a> and <a href="https://deepgram.com/"><u>Deepgram</u></a> to bring their latest and greatest models to Workers AI, hosted on Cloudflare’s infrastructure. Leonardo and Deepgram both have models with a great speed-to-performance ratio that suit the infrastructure of Workers AI. We’re starting off with these great partners — but expect to expand our catalog to other partner models as well.</p><p>The benefits of using these models on Workers AI is that we don’t only have a standalone inference service, we also have an entire suite of Developer products that allow you to build whole applications around AI. If you’re building an image generation platform, you could use Workers to <a href="https://www.cloudflare.com/developer-platform/solutions/hosting/">host the application logic</a>, Workers AI to generate the images, R2 for storage, and Images for serving and transforming media. If you’re building Realtime voice agents, we offer WebRTC and WebSocket support via Workers, speech-to-text, text-to-speech, and turn detection models via Workers AI, and an orchestration layer via Cloudflare Realtime. All in all, we want to lean into use cases that we think Cloudflare has a unique advantage in, with developer tools to back it up, and make it all available so that you can build the best AI applications on top of our holistic Developer Platform.</p>
    <div>
      <h2>Leonardo Models</h2>
      <a href="#leonardo-models">
        
      </a>
    </div>
    <p><a href="https://www.leonardo.ai"><u>Leonardo.Ai</u></a> is a generative AI media lab that trains their own models and hosts a platform for customers to create generative media. The Workers AI team has been working with Leonardo for a while now and have experienced the magic of their image generation models firsthand. We’re excited to bring on two image generation models from Leonardo: @cf/leonardo/phoenix-1.0 and @cf/leonardo/lucid-origin.</p><blockquote><p><i>“We’re excited to enable Cloudflare customers a new avenue to extend and use our image generation technology in creative ways such as creating character images for gaming, generating personalized images for websites, and a host of other uses... all through the Workers AI and the Cloudflare Developer Platform.” - </i><b><i>Peter Runham</i></b><i>, CTO, </i><a href="http://leonardo.ai"><i><u>Leonardo.Ai </u></i></a></p></blockquote><p>The Phoenix model is trained from the ground up by Leonardo, excelling at things like text rendering and prompt coherence. The full image generation request took 4.89s end-to-end for a 25 step, 1024x1024 image.</p>
            <pre><code>curl --request POST \
  --url https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/leonardo/phoenix-1.0 \
  --header 'Authorization: Bearer {TOKEN}' \
  --header 'Content-Type: application/json' \
  --data '{
    "prompt": "A 1950s-style neon diner sign glowing at night that reads '\''OPEN 24 HOURS'\'' with chrome details and vintage typography.",
    "width":1024,
    "height":1024,
    "steps": 25,
    "seed":1,
    "guidance": 4,
    "negative_prompt": "bad image, low quality, signature, overexposed, jpeg artifacts, undefined, unclear, Noisy, grainy, oversaturated, overcontrasted"
}'
</code></pre>
            
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1q7ndHYrwLQqqAdX6kGEkl/96ece588cf82691fa8e8d11ece382672/BLOG-2903_2.png" />
          </figure><p>The Lucid Origin model is a recent addition to Leonardo’s family of models and is great at generating photorealistic images. The image took 4.38s to generate end-to-end at 25 steps and a 1024x1024 image size.</p>
            <pre><code>curl --request POST \
  --url https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/leonardo/lucid-origin \
  --header 'Authorization: Bearer {TOKEN}' \
  --header 'Content-Type: application/json' \
  --data '{
    "prompt": "A 1950s-style neon diner sign glowing at night that reads '\''OPEN 24 HOURS'\'' with chrome details and vintage typography.",
    "width":1024,
    "height":1024,
    "steps": 25,
    "seed":1,
    "guidance": 4,
    "negative_prompt": "bad image, low quality, signature, overexposed, jpeg artifacts, undefined, unclear, Noisy, grainy, oversaturated, overcontrasted"
}'
</code></pre>
            
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/26VKWD8ua6Pe2awQWRnF7n/bb42c9612b08269af4ef38df39a2ed30/BLOG-2903_3.png" />
          </figure>
    <div>
      <h2>Deepgram Models</h2>
      <a href="#deepgram-models">
        
      </a>
    </div>
    <p>Deepgram is a voice AI company that develops their own audio models, allowing users to interact with AI through a natural interface for humans: voice. Voice is an exciting interface because it carries higher bandwidth than text, because it has other speech signals like pacing, intonation, and more. The Deepgram models that we’re bringing on our platform are audio models which perform extremely fast speech-to-text and text-to-speech inference. Combined with the Workers AI infrastructure, the models showcase our unique infrastructure so customers can build low-latency voice agents and more.</p><blockquote><p><i>"By hosting our voice models on Cloudflare's Workers AI, we're enabling developers to create real-time, expressive voice agents with ultra-low latency. Cloudflare's global network brings AI compute closer to users everywhere, so customers can now deliver lightning-fast conversational AI experiences without worrying about complex infrastructure." - </i><i><b>Adam Sypniewski</b></i><i>, CTO, Deepgram</i></p></blockquote><p><a href="https://developers.cloudflare.com/workers-ai/models/nova-3"><u>@cf/deepgram/nova-3</u></a> is a speech-to-text model that can quickly transcribe audio with high accuracy. <a href="https://developers.cloudflare.com/workers-ai/models/aura-1"><u>@cf/deepgram/aura-1</u></a> is a text-to-speech model that is context aware and can apply natural pacing and expressiveness based on the input text. The newer Aura 2 model will be available on Workers AI soon. We’ve also improved the experience of sending binary mp3 files to Workers AI, so you don’t have to convert it into an Uint8 array like you had to previously. Along with our Realtime announcements (coming soon!), these audio models are the key to enabling customers to build voice agents directly on Cloudflare.</p><p>With the AI binding, a call to the Nova 3 speech-to-text model would look like this:</p>
            <pre><code>const URL = "https://www.some-website.com/audio.mp3";
const mp3 = await fetch(URL);
 
const res = await env.AI.run("@cf/deepgram/nova-3", {
    "audio": {
      body: mp3.body,
      contentType: "audio/mpeg"
    },
    "detect_language": true
  });
</code></pre>
            <p>With the REST API, it would look like this:</p>
            <pre><code>curl --request POST \
  --url 'https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/deepgram/nova-3?detect_language=true' \
  --header 'Authorization: Bearer {TOKEN}' \
  --header 'Content-Type: audio/mpeg' \
  --data-binary @/path/to/audio.mp3</code></pre>
            <p>As well, we’ve added WebSocket support to the Deepgram models, which you can use to keep a connection to the inference server live and use it for bi-directional input and output. To use the Nova model with WebSocket support, check out our <a href="https://developers.cloudflare.com/workers-ai/models/nova-3"><u>Developer Docs</u></a>.</p><p>All the pieces work together so that you can:</p><ol><li><p><b>Capture audio</b> with Cloudflare Realtime from any WebRTC source</p></li><li><p><b>Pipe it</b> via WebSocket to your processing pipeline</p></li><li><p><b>Transcribe</b> with audio ML models Deepgram running on Workers AI</p></li><li><p><b>Process</b> with your LLM of choice through a model hosted on Workers AI or proxied via <a href="https://developers.cloudflare.com/ai-gateway/"><u>AI Gateway</u></a></p></li><li><p><b>Orchestrate</b> everything with Realtime Agents</p></li></ol>
    <div>
      <h2>Try these models out today</h2>
      <a href="#try-these-models-out-today">
        
      </a>
    </div>
    <p>Check out our<a href="https://developers.cloudflare.com/workers-ai/"><u> developer docs</u></a> for more details, pricing and how to get started with the newest partner models available on Workers AI.</p><div>
  
</div><p></p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <category><![CDATA[Workers AI]]></category>
            <guid isPermaLink="false">35N861jwJHF4GEiRCDxWP</guid>
            <dc:creator>Michelle Chen</dc:creator>
            <dc:creator>Nikhil Kothari</dc:creator>
        </item>
        <item>
            <title><![CDATA[How Cloudflare runs more AI models on fewer GPUs: A technical deep-dive ]]></title>
            <link>https://blog.cloudflare.com/how-cloudflare-runs-more-ai-models-on-fewer-gpus/</link>
            <pubDate>Wed, 27 Aug 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare built an internal platform called Omni. This platform uses lightweight isolation and memory over-commitment to run multiple AI models on a single GPU. ]]></description>
            <content:encoded><![CDATA[ <p>As the demand for AI products grows, developers are creating and tuning a wider variety of models. While adding new models to our <a href="https://developers.cloudflare.com/workers-ai/models/"><u>growing catalog</u></a> on Workers AI, we noticed that not all of them are used equally – leaving infrequently used models occupying valuable GPU space. Efficiency is a core value at Cloudflare, and with GPUs being the scarce commodity they are, we realized that we needed to build something to fully maximize our GPU usage.</p><p>Omni is an internal platform we’ve built for running and managing AI models on Cloudflare’s edge nodes. It does so by spawning and managing multiple models on a single machine and GPU using lightweight isolation. Omni makes it easy and efficient to run many small and/or low-volume models, combining multiple capabilities by:  </p><ul><li><p>Spawning multiple models from a single control plane,</p></li><li><p>Implementing lightweight process isolation, allowing models to spin up and down quickly,</p></li><li><p>Isolating the file system between models to easily manage per-model dependencies, and</p></li><li><p>Over-committing GPU memory to run more models on a single GPU.</p></li></ul><p>Cloudflare aims to place GPUs as close as we possibly can to people and applications that are using them. With Omni in place, we’re now able to run more models on every node in our network, improving model availability, minimizing latency, and reducing power consumed by idle GPUs.</p><p>Here’s how. </p>
    <div>
      <h2>Omni’s architecture – at a glance</h2>
      <a href="#omnis-architecture-at-a-glance">
        
      </a>
    </div>
    <p>At a high level, Omni is a platform to run AI models. When an <a href="https://www.cloudflare.com/learning/ai/inference-vs-training/"><u>inference</u></a> request is made on Workers AI, we load the model’s configuration from <a href="https://developers.cloudflare.com/kv/"><u>Workers KV</u></a> and our routing layer forwards it to the closest Omni instance that has available capacity. For inferences using the <a href="https://developers.cloudflare.com/workers-ai/features/batch-api/"><u>Asynchronous Batch API</u></a>, we route to an Omni instance that is idle, which is typically in a location where it’s night.</p><p>Omni runs a few checks on the inference request, runs model specific pre and post processing, then hands the request over to the model.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4zlObplZsGgpxPyUoD5NXe/ddd1cb8af444460d54fa5e0ab6e58c87/1.png" />
          </figure>
    <div>
      <h2>Elastic scaling by spawning multiple models from a single control plane</h2>
      <a href="#elastic-scaling-by-spawning-multiple-models-from-a-single-control-plane">
        
      </a>
    </div>
    <p>If you’re developing an AI application, a typical setup is having a container or a VM dedicated to running a single model with a GPU attached to it. This is simple. But it’s also heavy-handed — because it requires managing the entire stack from provisioning the VM, installing GPU drivers, downloading model weights, and managing the Python environment. At scale, managing infrastructure this way is incredibly time consuming and often requires an entire team. </p><p>If you’re using Workers AI, we handle all of this for you. Omni uses a single control plane for running multiple models, called the scheduler, which automatically provisions models and spawns new instances as your traffic scales. When starting a new model instance, it downloads model weights, Python code, and any other dependencies. Omni’s scheduler provides fine-grained control and visibility over the model’s lifecycle: it receives incoming inference requests and routes them to the corresponding model processes, being sure to distribute the load between multiple GPUs. It then makes sure the model processes are running, rolls out new versions as they are released, and restarts itself when detecting errors or failure states. It also collects metrics for billing and emits logs.</p><p>The inference itself is done by a per-model process, supervised by the scheduler. It receives the inference request and some metadata, then sends back a response. Depending on the model, the response can be various types; for instance, a JSON object or a SSE stream for text generation, or binary for image generation.</p><p>The scheduler and the child processes communicate by passing messages over Inter-Process Communication (IPC). Usually the inference request is buffered in the scheduler for applying features, like prompt templating or tool calling, before the request is passed to the child process. For potentially large binary requests, the scheduler hands over the underlying TCP connection to the child process for consuming the request body directly.</p>
    <div>
      <h2>Implementing lightweight process and Python isolation</h2>
      <a href="#implementing-lightweight-process-and-python-isolation">
        
      </a>
    </div>
    <p>Typically, deploying a model requires its own dedicated container, but we want to colocate more models on a single container to conserve memory and GPU capacity. In order to do so, we needed finer-grained controls over CPU memory and the ability to isolate a model from its dependencies and environment. We deploy Omni in two configurations; a container running multiple models or bare metal running a single model. In both cases, process isolation and Python virtual environments allow us to isolate models with different dependencies by creating namespaces and are limited by <a href="https://en.wikipedia.org/wiki/Cgroups"><u>cgroups</u></a>. </p><p>Python doesn’t take into account cgroups memory limits for memory allocations, which can lead to OOM errors. Many AI Python libraries rely on <a href="https://pypi.org/project/psutil/"><u>psutil</u></a> for pre-allocating CPU memory. psutil reads /proc/meminfo to determine how much memory is available. Since in Omni each model has its own configurable memory limits, we need psutil to reflect the current usage and limits for a given model, not for the entire system.</p><p>The solution for us was to create a virtual file system, using <a href="https://en.wikipedia.org/wiki/Filesystem_in_Userspace"><u>fuse</u></a>, to mount our own version of /proc/meminfo which reflects the model’s current usage and limits.</p><p>To illustrate this, here’s an Omni instance running a model (running as pid 8). If we enter the mount namespace and look at /proc/meminfo it will reflect the model’s configuration:</p>
            <pre><code># Enter the mount (file system) namespace of a child process
$ nsenter -t 8 -m

$ mount
...
none /proc/meminfo fuse ...

$ cat /proc/meminfo
MemTotal:     7340032 kB
MemFree:     7316388 kB
MemAvailable:     7316388 kB</code></pre>
            <p>In this case the model has 7Gib of memory available and the entire container 15Gib. If the model tries to allocate more than 7Gib of memory, it will be OOM killed and restarted by the scheduler’s process manager, without causing any problems to the other models.</p><p>For isolating Python and some system dependencies, each model runs in a Python virtual environment, managed by <a href="https://docs.astral.sh/uv/"><u>uv</u></a>. Dependencies are cached on the machine and, if possible, shared between models (uv uses symbolic links between its cache and virtual environments).</p><p>Also separated processes for models allows to have different CUDA contexts and isolation for error recovery. </p>
    <div>
      <h2>Over-committing memory to run more models on a single GPU</h2>
      <a href="#over-committing-memory-to-run-more-models-on-a-single-gpu">
        
      </a>
    </div>
    <p>Some models don’t receive enough traffic to fully utilize a GPU, and with Omni we can pack more models on a single GPU, freeing up capacity for other workloads. When it comes to GPU memory management, Omni has two main jobs: safely over-commit GPU memory, so that more models than normal can share a single GPU, and enforce memory limits, to prevent any single model from running out of memory while running.      </p><p>Over-committing memory means allocating more memory than is physically available to the device. </p><p>For example, if a GPU has 10 Gib of memory, Omni would allow 2 models of 10Gib each on that GPU.</p><p>Right now, Omni is configured to run 13 models and is allocating about 400% GPU memory on a single GPU, saving up 4 GPUs. Omni does this by injecting a CUDA stub library that intercepts CUDA memory allocations (cuMalloc* or cudaMalloc*) calls and forces memory allocations to be performed in <a href="https://developer.nvidia.com/blog/unified-memory-in-cuda-6/"><u>unified memory mode</u></a>.</p><p>In Unified memory mode CUDA shares the same memory address space for both the GPU and the CPU:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2G5zd0TDi15ZeFAmcJy812/1b292429140ec2c4bd0a81bee4954150/2.png" />
          </figure><p><sup><i>CUDA’s </i></sup><a href="https://developer.nvidia.com/blog/maximizing-unified-memory-performance-cuda/"><sup><i><u>unified memory mode</u></i></sup></a><sup><i> </i></sup></p><p>In practice this is what memory over-commitment looks like: imagine 3 models (A, B and C). Models A+B fit in the GPU’s memory but C takes up the entire memory.</p><ol><li><p>Models A+B are loaded first and are in GPU memory, while model C is in CPU memory</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/xU141x0PaZRp83XlF6hWz/527915ee03309f619a64e6b43c62cd92/3.png" />
          </figure></li><li><p>Omni receives a request for model C so models A+B are swapped out and C is swapped in.
</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4fD3Y2xyyawGmo1gpdLsQz/1cd36ebaed6b7f9e95b3d31ead1c1098/4.png" />
          </figure></li><li><p>Omni receives a request for model B, so model C is partly swapped out and model B is swapped back in.
</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2v5JjDW0NCkVUfEBXwIgpL/62009bc970b0967a850cb31ef87be44b/5.png" />
          </figure></li><li><p>Omni receives a request for model A, so model A is swapped back in and model C is completely swapped out.</p></li></ol>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3cWGbEGgv3QckT7jgUIs9d/2c500a432be451a83dce0c71ccdcb89f/6.png" />
          </figure><p>The trade-off is added latency: if performing an inference requires memory that is currently on the host system, it must be transferred to the GPU. For smaller models, this latency is minimal, because with PCIe 4.0, the physical bus between your GPU and system, provides 32 GB/sec of bandwidth. On the other hand, if a model need to be “cold started” i.e. it’s been swapped out because it hasn’t been used in a while, the system may need to swap back the entire model – a larger sized model, for example, might use 5Gib of GPU memory for weights and caches, and would take ~156ms to be swapped back into the GPU. Naturally, over time, inactive models are put into CPU memory, while active models stay hot in the GPU.</p><p>Rather than allowing the model to choose how much GPU memory it uses, AI frameworks tend to pre-allocate as much GPU memory as possible for performance reasons, making co-locating models more complicated. Omni allows us to control how much memory is actually exposed to any given model to prevent a greedy model from over-using the GPU allocated to it. We do this by overriding the CUDA runtime and driver APIs (<a href="https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1g376b97f5ab20321ca46f7cfa9511b978"><u>cudaMemGetInfo</u></a> and <a href="https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g808f555540d0143a331cc42aa98835c0"><u>cuMemGetInfo</u></a>). Instead of exposing the entire GPU memory, we only expose a subset of memory to each model.</p>
    <div>
      <h2>How Omni runs multiple models for Workers AI </h2>
      <a href="#how-omni-runs-multiple-models-for-workers-ai">
        
      </a>
    </div>
    <p>AI models can run in a variety of inference engines or backends: <a href="https://github.com/vllm-project/vllm"><u>vLLM</u></a>, Python, and now our very own inference engine, <a href="http://blog.cloudflare.com/cloudflares-most-efficient-ai-inference-engine/"><u>Infire</u></a>. While models have different capabilities, each model needs to support <a href="https://developers.cloudflare.com/workers-ai/"><u>Workers AI features</u></a>, like batching and function calling. Omni acts as a unified layer for integrating these systems. It integrates into our internal routing and scheduling systems, and provides a Python API for our engineering team to add new models more easily. Let’s take a closer look at how Omni does this in practice:</p>
            <pre><code>from omni import Response
import cowsay


def handle_request(request, context):
    try:
        json = request.body.json
        text = json["text"]
    except Exception as err:
        return Response.error(...)

    return cowsay.get_output_string('cow', text)</code></pre>
            <p>Similar to how a JavaScript Worker works, Omni calls a request handler, running the model’s logic and returning a response. </p><p>Omni installs Python dependencies at model startup. We run an internal Python registry and mirror the public registry. In either case we declare dependencies in requirements.txt:</p>
            <pre><code>cowsay==6.1</code></pre>
            <p>The handle_request function can be async and return different Python types, including <a href="https://docs.pydantic.dev/latest/"><u>pydantic</u></a> objects. Omni will convert the return value into a Workers AI response for the eyeball.</p><p>A Python package is injected, named omni, containing all the Python APIs to interact with the request, the Workers AI systems, building Responses, error handling, etc. Internally we publish it as regular Python package to be used in standalone, for unit testing for instance:</p>
            <pre><code>from omni import Context, Request
from model import handle_request


def test_basic():
    ctx = Context.inactive()
    req = Request(json={"text": "my dog is cooler than you!"})
    out = handle_request(req, ctx)
    assert out == """  __________________________
| my dog is cooler than you! |
  ==========================
                          \\
                           \\
                             ^__^
                             (oo)\\_______
                             (__)\\       )\\/\\
                                 ||----w |
                                 ||     ||"""</code></pre>
            
    <div>
      <h2>What’s next </h2>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>Omni allows us to run models more efficiently by spawning them from a single control plane and implementing lightweight process isolation. This enables quick starting and stopping of models, isolated file systems for managing Python and system dependencies, and over-committing GPU memory to run more models on a single GPU. This improves the performance for our entire Workers AI stack, reduces the cost of running GPUs, and allows us to ship new models and features quickly and safely.</p><p>Right now, Omni is running in production on a handful of models in the Workers AI catalog, and we’re adding more every week. Check out <a href="https://developers.cloudflare.com/workers-ai/"><u>Workers AI</u></a> today to experience Omni’s performance benefits on your AI application. </p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[AI]]></category>
            <guid isPermaLink="false">KjxPspfQBaaHQ5K8ALjv8</guid>
            <dc:creator>Sven Sauleau</dc:creator>
            <dc:creator>Mari Galicer</dc:creator>
        </item>
        <item>
            <title><![CDATA[Securing the AI Revolution: Introducing Cloudflare MCP Server Portals]]></title>
            <link>https://blog.cloudflare.com/zero-trust-mcp-server-portals/</link>
            <pubDate>Tue, 26 Aug 2025 14:05:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare MCP Server Portals are now available in Open Beta. MCP Server Portals are a new capability that enable you to centralize, secure, and observe every MCP connection in your organization. ]]></description>
            <content:encoded><![CDATA[ 
    <div>
      <h3><b>Securing the AI Revolution: Introducing Cloudflare MCP Server Portals</b></h3>
      <a href="#securing-the-ai-revolution-introducing-cloudflare-mcp-server-portals">
        
      </a>
    </div>
    <p><a href="https://www.cloudflare.com/learning/ai/what-is-large-language-model/"><u>Large Language Models (LLMs)</u></a> are rapidly evolving from impressive information retrieval tools into active, intelligent agents. The key to unlocking this transformation is the <b>Model Context Protocol (MCP)</b>, an open-source standard that allows LLMs to securely connect to and interact with any application — from Slack to Canva, to your own internal databases.</p><p>This is a massive leap forward. With MCP, an LLM client like Gemini, Claude, or ChatGPT can answer more than just "tell me about Slack." You can ask it: "What were the most critical engineering P0s in Jira from last week, and what is the current sentiment in the #engineering-support Slack channel regarding them? Then propose updates and bug fixes to merge."</p><p>This is the power of MCP: turning models into teammates.</p><p>But this great power comes with proportional risk. Connecting LLMs to your most critical applications creates a new, complex, and largely unprotected <a href="https://www.cloudflare.com/learning/security/what-is-an-attack-surface/"><u>attack surface</u></a>. Today, we change that. We’re excited to announce Cloudflare <b>MCP Server Portals</b> are now available in Open Beta. MCP Server Portals are a new capability that enable you to centralize, secure, and observe every MCP connection in your organization. This feature is part of <a href="https://www.cloudflare.com/zero-trust/"><u>Cloudflare One</u></a>, our <a href="https://www.cloudflare.com/learning/access-management/what-is-sase/"><u>secure access service edge (SASE)</u></a> platform that helps connect and protect your workspace.</p>
    <div>
      <h3><b>What Exactly is the Model Context Protocol?</b></h3>
      <a href="#what-exactly-is-the-model-context-protocol">
        
      </a>
    </div>
    <p>Think of <a href="https://www.cloudflare.com/learning/ai/what-is-model-context-protocol-mcp/"><u>MCP</u></a> as a universal translator or a digital switchboard for AI. It’s a standardized set of rules that lets two very different types of software—LLMs and everyday applications—talk to each other effectively. It consists of two primary components:</p><ul><li><p><b>MCP Clients:</b> These are the LLMs you interact with, like ChatGPT, Claude, or Gemini. The client is the front end to the AI that you use to ask questions and give commands.</p></li><li><p><b>MCP Servers:</b> These can be developed for any application you want to connect to your LLM. SaaS providers like Slack or Atlassian may offer MCP servers for their products, or your own developers can also build custom ones for internal tools.</p></li></ul>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4Du5DBczqtDdq3qhNPbQWt/479d741dcef445f73b5da82e716fdd32/image3.png" />
          </figure><p>Credit: <a href="https://modelcontextprotocol.io/docs/learn/architecture"><u>Architecture Overview - Model Context Protocol</u></a></p><p>For a useful connection, MCP relies on a few other key concepts:</p><ul><li><p><b>Resources:</b> A mechanism for the server to give the LLM context. This could be a specific file, a database schema, or a list of users in an application.</p></li><li><p><b>Prompts:</b> Standardized questions the server can ask the client to get the information it needs to fulfill a request (e.g., "Which user do you want to search for?").</p></li><li><p><b>Tools:</b> These are the actions the client can ask the server to perform, like querying a database, calling an API, or sending a message.</p></li></ul><p>Without MCP, your LLM is isolated. With MCP, it's integrated, capable of interacting with your entire software ecosystem in a structured and predictable way.</p>
    <div>
      <h3><b>The Peril of an Unsecured AI Ecosystem</b></h3>
      <a href="#the-peril-of-an-unsecured-ai-ecosystem">
        
      </a>
    </div>
    <p>Think of an LLM as the most brilliant and enthusiastic junior hire you've ever had. They have boundless energy and can produce incredible work, but they lack the years of judgment to know what they <i>shouldn't</i> do. The current, decentralized approach to MCP is like giving that junior hire a master key to every office and server room on their first day.</p><p>It's not a matter of <i>if</i> something will go wrong, but <i>when</i>.</p><p>This "shadow AI" infrastructure is the modern equivalent of the early Internet, where every server had a public IP address, fully exposed to the world. It’s the Wild West of unmanaged connections, impossible to secure. And the risks go far beyond accidental data deletion. Attackers are actively exploiting the unique vulnerabilities of LLM-driven ecosystems:</p><ul><li><p><b>Prompt and tool injection:</b> This is more than just telling a model to "ignore previous instructions." Attackers are now hiding malicious commands inside the descriptions of MCP tools themselves. Consider an LLM seeking to use a seemingly harmless "WebSearch" tool. A poisoned description could trick it into also running a query against a financial database and exfiltrating the results.</p></li><li><p><b>Supply chain attacks:</b> How can you trust the third-party MCP servers used by your teams? In mid-2025, a critical vulnerability (<a href="https://nvd.nist.gov/vuln/detail/CVE-2025-6514"><b><u>CVE-2025-6514</u></b></a>) was discovered in a popular npm package used for MCP authentication, exposing countless servers. In another incident dubbed "<b>NeighborJack</b>," security researchers found hundreds of MCP servers inadvertently exposed to the public Internet because they were bound to 0.0.0.0 without a firewall, allowing for potential OS command injection and host takeover.</p></li><li><p><b>Privilege escalation and the "confused deputy":</b> An attacker doesn't need to break your LLM; they just need to confuse it. In one documented case, an AI agent running with high-level privileges was tricked into executing SQL commands embedded in a support ticket. The agent, acting as a "confused deputy," couldn't distinguish the malicious SQL from the legitimate ticket data and dutifully executed the commands, compromising an entire database.</p></li><li><p><b>Data leakage:</b> Without centralized controls, data can bleed between systems in unexpected ways. <a href="https://www.bleepingcomputer.com/news/security/asana-warns-mcp-ai-feature-exposed-customer-data-to-other-orgs/"><u>In June 2025</u></a>, a popular team collaboration tool’s MCP integration suffered a privacy breach where a bug caused some customer information to become visible in other customers' MCP instances, forcing them to take the integration offline for two weeks.</p></li></ul>
    <div>
      <h3><b>The Solution: A Single Front Door for Your MCP Servers</b></h3>
      <a href="#the-solution-a-single-front-door-for-your-mcp-servers">
        
      </a>
    </div>
    <p>You can't protect what you can't see. <b>Cloudflare MCP Server Portals</b> solve this problem by providing a single, centralized gateway for all your MCP servers, somewhat similar to an application launcher for <a href="https://www.cloudflare.com/learning/access-management/what-is-sso/"><u>single sign-on</u></a>. Instead of developers distributing dozens of individual server endpoints, they register their servers with Cloudflare. You provide your users with a single, unified Portal endpoint to configure in their MCP client.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5gIceb6D72AwuQSNjq0eqb/25147ec57731dd2e016887d6bab33f55/image1.png" />
          </figure><p>This changes the security posture and user experience immediately. By routing all MCP traffic through Cloudflare, you get:</p><ul><li><p><b>Centralized policy enforcement:</b> You can integrate MCP Server Portals directly into Cloudflare One. This means you can enforce the same granular access policies for your AI connections that you do for your human users. Require <a href="https://www.cloudflare.com/learning/access-management/what-is-multi-factor-authentication/"><u>multi-factor authentication</u></a>, check for device posture, restrict by geography, and ensure only the right users can access specific servers and tools.</p></li><li><p><b>Comprehensive visibility and logging:</b> Who is accessing which MCP server and which toolsets are they engaging with? What prompts are being run? What tools are being invoked? Previously, this data was scattered across every individual server. Server Portals aggregate all MCP request logs into a single place, giving you the visibility needed to audit activity and detect anomalies before they become breaches.</p></li><li><p><b>A curated AI user experience based on least privilege:</b> Administrators can now review and approve MCP servers before making them available to users through a Portal. When a user authenticates through their Portal, they are only presented with the curated list of servers and tools they are authorized to use, preventing the use of unvetted or malicious third-party servers. This approach adheres to the <a href="https://www.cloudflare.com/learning/security/glossary/what-is-zero-trust/"><u>Zero Trust security</u></a> best practice of <a href="https://www.cloudflare.com/learning/access-management/principle-of-least-privilege/"><u>least privilege</u></a>.</p></li><li><p><b>Simplified user configuration: </b>Instead of having to load individual MCP server configurations into a MCP Client, users can load a single URL that pulls down all accessible MCP Servers. This drastically simplifies how many URLs need to be shared out and known by users. As new MCP Servers are added, they become dynamically available through the portal, instead of sharing each new URL on publishing of a server.</p></li></ul><p>When a user connects to their MCP Server Portal, <a href="https://www.cloudflare.com/zero-trust/products/access/"><u>Access</u></a> prompts them to authenticate with their corporate identity provider. Once authenticated, Cloudflare enforces which MCP Servers the user has access to, regardless of the underlying server’s authorization policies. </p><p>For MCP servers with domains hosted on Cloudflare, Access policies can be used to enforce the server’s direct authorization. This is done by creating an <a href="https://developers.cloudflare.com/cloudflare-one/applications/configure-apps/mcp-servers/linked-apps/"><u>OAuth server that is linked to the domain’s existing Access Application</u></a>. For MCP servers with domains outside Cloudflare and/or hosted by a third party, they require <a href="https://modelcontextprotocol.io/specification/2025-06-18/basic/authorization"><u>authorization controls</u></a> outside of Cloudflare Access, this is usually done using OAuth.</p>
    <div>
      <h3><b>The Road Ahead: What's Next for AI Security</b></h3>
      <a href="#the-road-ahead-whats-next-for-ai-security">
        
      </a>
    </div>
    <p>MCP Server Portals are a foundational step in our mission to <a href="https://www.cloudflare.com/ai-security/">secure the AI revolution</a>. This is just the beginning. In the coming months, we plan to build on this foundation by:</p><ul><li><p><b>Mechanisms to lock down MCP Servers: </b>Unless an MCP Server author enforces <a href="https://modelcontextprotocol.io/specification/2025-06-18/basic/authorization"><u>Authorization</u></a> controls, users can still technically access MCPs outside of a Portal. We will build additional enforcement mechanisms to prevent this.</p></li><li><p><b>Integrating with Firewall for AI:</b> Imagine applying the power of our <a href="https://www.cloudflare.com/application-services/products/waf/"><u>WAF</u></a> to your MCP traffic, detecting and blocking prompt injection attacks before they ever reach your servers.</p></li><li><p><b>Cloudflare hosted MCP Servers: </b>We will make it easy to deploy MCP Servers using Cloudflare’s <a href="https://www.cloudflare.com/developer-platform/products/ai-gateway/"><u>AI Gateway</u></a>. This will allow for deeper prompt filtering and controls.</p></li><li><p><b>Applying machine learning to detect abuse:</b> We will layer our own <a href="https://www.cloudflare.com/learning/ai/what-is-machine-learning/"><u>machine learning models</u></a> on top of your MCP logs to automatically identify anomalous behavior, such as unusual data exfiltration patterns or suspicious tool usage.</p></li><li><p><b>Enhancing the protocol:</b> We are committed to working with the open-source community to strengthen the MCP standard itself, contributing to a more secure and robust ecosystem for everyone.</p></li></ul><p>This is our commitment: to provide the tools you need to innovate with confidence.</p>
    <div>
      <h3><b>Get Started Today!</b></h3>
      <a href="#get-started-today">
        
      </a>
    </div>
    <p>Progress doesn't have to come at the expense of security. With MCP Server Portals, you can empower your teams to build the future with AI, safely. This is a critical piece of helping to build a better Internet, and we are excited to see what you will build with it.</p><p>MCP Server Portals are now available in Open Beta for all Cloudflare One customers. To get started, navigate to the <b>Access &gt; AI Controls</b> page in the Zero Trust Dashboard. If you don't have an account, you can <a href="https://dash.cloudflare.com/sign-up/zero-trust"><u>sign up today</u></a> and get started with up to 50 free seats or <a href="https://www.cloudflare.com/products/zero-trust/plans/enterprise/?utm_medium=referral&amp;utm_source=blog&amp;utm_campaign=2025-q3-acq-gbl-connectivity-ge-ge-general-ai_week_blog"><u>contact our experts</u></a> to explore larger deployments.</p><p>Cloudflare is also starting a user research program focused on <a href="https://www.cloudflare.com/learning/ai/what-is-ai-security/">AI security</a>. If you are interested in previews of new functionality or want to help shape our roadmap, <a href="https://www.cloudflare.com/lp/ai-security-user-research-program-2025"><u>please express your interest here</u></a>.  </p><div>
  
</div><p></p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[MCP]]></category>
            <guid isPermaLink="false">6UkXhpttlAzNjxsaKtVwje</guid>
            <dc:creator>Kenny Johnson</dc:creator>
        </item>
        <item>
            <title><![CDATA[Introducing Cloudflare Application Confidence Score For AI Applications]]></title>
            <link>https://blog.cloudflare.com/confidence-score-rubric/</link>
            <pubDate>Tue, 26 Aug 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare will provide confidence scores within our application library for Gen AI applications, allowing customers to assess their risk for employees using shadow IT.  ]]></description>
            <content:encoded><![CDATA[ 
    <div>
      <h2>Introduction</h2>
      <a href="#introduction">
        
      </a>
    </div>
    <p>The availability of SaaS and <a href="https://www.cloudflare.com/learning/ai/what-is-generative-ai/"><u>Gen AI</u></a> applications is transforming how businesses operate, boosting collaboration and productivity across teams. However, with increased productivity comes increased risk, as employees turn to unapproved SaaS and Gen AI applications, often dumping sensitive data into them for quick productivity wins. </p><p>The prevalence of “Shadow IT” and “Shadow AI” creates multiple problems for security, IT, GRC and legal teams. For example:</p><ul><li><p>Gen AI applications may train their models on user inputs, which could expose proprietary corporate information to third parties, competitors, or even through clever attacks like <a href="https://genai.owasp.org/llmrisk/llm01-prompt-injection/"><u>prompt injection</u></a>. </p></li><li><p>Applications may retain user data for long periods, share data with <a href="https://www.malwarebytes.com/blog/news/2025/02/deepseek-found-to-be-sharing-user-data-with-tiktok-parent-company-bytedance#:~:text=PIPC%20said%20that%20DeepSeek%E2%80%94an,without%20disclosure%20or%20explicit%20consent."><u>third parties</u></a>, have <a href="https://www.wiz.io/blog/38-terabytes-of-private-data-accidentally-exposed-by-microsoft-ai-researchers"><u>lax security practices</u></a>, suffer a <a href="https://www.wired.com/story/mcdonalds-ai-hiring-chat-bot-paradoxai/"><u>data breach</u></a>, or even go <a href="https://www.npr.org/2025/03/24/nx-s1-5338622/23andme-bankruptcy-genetic-data-privacy"><u>bankrupt</u></a>, leaving sensitive data exposed to the highest bidder.  </p></li><li><p>Gen AI applications may produce outputs that are biased, unsafe or incorrect, leading to <a href="https://www.europarl.europa.eu/thinktank/en/document/EPRS_ATA(2025)769509"><u>compliance violations</u></a> or <a href="https://www.bbc.com/news/world-us-canada-65735769"><u>bad</u></a> <a href="https://www.theguardian.com/media/2023/oct/31/microsoft-accused-of-damaging-guardians-reputation-with-ai-generated-poll"><u>business</u></a> <a href="https://www.reuters.com/article/world/insight-amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK0AG/"><u>decisions</u></a>.</p></li></ul><p>In spite of these problems, <a href="https://www.cloudflare.com/the-net/banning-ai/"><u>blanket bans of Gen AI</u></a> don't work. They stifle innovation and push employee usage underground. Instead, organizations need smarter controls.</p><p>Security, IT, legal and GRC teams therefore face a difficult challenge: how can you appropriately assess each third-party application, without auditing and crafting individual policies for every single one of them that your employees might decide to interact with? And with the rate at which they’re proliferating — how could you possibly hope to keep abreast of them all?</p><p>Today, we’re excited to announce that we’re helping these teams automate assessment of SaaS and Gen AI applications at scale with the introduction of our new <b>Cloudflare Application Confidence Scores. </b>Scores will soon be available as part of our new suite of <a href="https://blog.cloudflare.com/best-practices-sase-for-ai/"><u>AI Security Posture Management (AI-SPM)</u></a> features in the Cloudflare One SASE platform, enabling IT and Security administrators to identify confidence levels associated with third-party SaaS and AI applications, and ultimately write policies informed by those confidence scores. We’re starting by scoring AI applications, because that’s where the need is most urgent.</p><p>In this blog, we’ll be covering the design of our Cloudflare Application Confidence Score, focusing specifically about the features of the score and our scoring rubric.  Our current goal is to reveal the details of our scoring rubric, which is designed to be as transparent and objective as possible — while simultaneously <a href="https://www.cloudflare.com/ai-security/">helping organizations of all sizes safely adopt AI</a>, and encouraging the industry and AI providers to adopt <a href="https://www.cloudflare.com/learning/ai/what-is-ai-security/">best practices for AI safety and security</a>.  </p><p>In the future, as part of our mission to help build a better Internet, we also plan to make Cloudflare Application Confidence Scores available for free to all our customer tiers. And even if you aren’t a Cloudflare customer, you will easily be able to browse through these Scores by creating a free account on the Cloudflare <a href="https://dash.cloudflare.com/"><u>dashboard</u></a> and navigating to our new <a href="https://developers.cloudflare.com/changelog/2025-07-07-dashboard-app-library/"><u>Application Library</u></a>.  </p>
    <div>
      <h2>Transparency, not vibes</h2>
      <a href="#transparency-not-vibes">
        
      </a>
    </div>
    <p>Cloudflare Application Confidence Scores is a transparent, understandable, and accountable metric that measures app safety, security, and data protection. It’s designed to give Security, IT, legal and GRC teams a rapid way of assessing the rapidly burgeoning space of AI applications.</p><p>Scores are not based on vibes or black-box “learning algorithms” or “artificial intelligence engines”.  We avoid subjective judgments or large-scale red-teaming as those can be tough to execute reliably and consistently over time. Instead, scores will be computed against an objective rubric that we describe in detail in this blog. Our rubric will be publicly maintained and kept up to date in the Cloudflare developer docs. </p><p>Many providers of the applications that we score are also our customers and partners, so our overarching goal is to be as fair and accountable as possible. We believe that transparency will build trust in our scoring rubric and guide the industry to adopt the best practices that our scoring rubric encourages. </p>
    <div>
      <h2>Principles behind our rubric</h2>
      <a href="#principles-behind-our-rubric">
        
      </a>
    </div>
    <p>Each component of our rubric requires a simple answer based on publicly available data like privacy policies, security documentation, compliance certifications, model cards and incident reports. If something isn't publicly disclosed, we assign zero points to that component of the rubric, with no further assumptions or guesswork.  Scores are computed according to our rubric via an automated system that incorporates human oversight for accuracy.  We use crawlers to collect public information (e.g. privacy policies, compliance documents), process it using AI for extraction and to compute the resulting scores, and then send them to human analysts for a final review.   </p><p>Scores are reviewed on a periodic basis. If a vendor believes that we have mis-scored their application, they can submit supporting documentation via <a><u>app-confidence-scores@cloudflare.com</u></a>, and we will update their score if appropriate.</p><p>Scores are on a scale from 1 to 5, with 5 being the highest confidence and 1 being the most risky. We decided to use a <b>"confidence score"</b> instead of a <b>"risk score"</b> because we can express confidence in an application when it provides clear positive evidence of good security, compliance and safety practices. An application may have good practices internally, but we cannot express confidence in these practices if they are not publicly documented. Moreover, a confidence score allows us to give customers transparent information, so they can make their own informed decisions. For example, an application might get a low confidence score because it lacks a documented data retention policy. While that might be a concern for some, your organization might find it acceptable and decide to allow the application anyway.</p><p>We separately evaluate different account tiers for the same application provider, because different account tiers can provide very different levels of enterprise risk. For instance, consumer plans (e.g. ChatGPT Free) may involve training on user prompts and score lower, whereas enterprise plans (e.g. ChatGPT Enterprise) do not train on user prompts and thus score higher. </p><p>That said, we are quite opinionated about components we selected in our rubric, drawing from deep experience of our own internal product, engineering, legal, GRC, and security teams. We prioritize factors like data retention policies and encryption standards because we believe they are foundational to protecting sensitive information in an AI-driven world. We included certifications, security frameworks and model cards because they provide evidence of maturity, stability, safety and adherence with industry best practices.</p>
    <div>
      <h2>Actually, it’s really two Scores</h2>
      <a href="#actually-its-really-two-scores">
        
      </a>
    </div>
    <p>As AI applications emerge at an unprecedented pace, the problem of "Shadow AI" intensifies traditional risks associated with Shadow IT. Shadow IT applications create risk when they retain user data for long periods, have lax security practices, are financially unstable, or widely share data with third parties.  Meanwhile, AI tools create new risks when they retain and train on user prompts, or generate responses that are biased, toxic, inaccurate or unsafe. </p><p>To separate out these different risks, we provide two different Scores: </p><ul><li><p><b>Application Confidence Score</b> (5 points) covers general SaaS maturity, and</p></li><li><p><b>Gen-AI Confidence Score</b> (5 points) focused on Gen AI-specific risks.</p></li></ul><p>We chose to focus on two separate areas to make our metric extensible (so that, in the future, we can apply it to applications that are not focused on Gen AI) and to make the Scores easier to understand and reason about.   </p><p>Each Score is applied to each account tier of a given Gen AI provider. For example, here’s how we scored OpenAI's ChatGPT:</p><ul><li><p><b>ChatGPT Free (App Confidence 3.3, GenAI Confidence 1)</b> received a low score due to limited enterprise controls and higher data exposure risk since by default, input data is used for model training.</p></li><li><p><b>ChatGPT Plus (App Confidence 3.3, GenAI Confidence 3)</b> scored slightly higher as it allows users to opt out of training on their input data.</p></li><li><p><b>ChatGPT Team (App Confidence 4.3, GenAI Confidence 3)</b> improved further with added collaboration safeguards and configurable data retention windows.</p></li><li><p><b>ChatGPT Enterprise (App Confidence 4.3, GenAI Confidence 4)</b> achieved the highest score, as training on input data is disabled by default while retaining the enhanced controls from the Team tier.</p></li></ul>
    <div>
      <h2>A detailed look at our rubric</h2>
      <a href="#a-detailed-look-at-our-rubric">
        
      </a>
    </div>
    <p>We now walk through the details of the rubric behind each of our Scores.</p>
    <div>
      <h3>Application Confidence Score (5.0 Points Total)</h3>
      <a href="#application-confidence-score-5-0-points-total">
        
      </a>
    </div>
    <p>This half evaluates the app's overall maturity as a SaaS service, drawing from enterprise best practices.</p><p><b>Regulatory Compliance:</b> Checks for key certifications that signal operational maturity. We selected these because they represent proven frameworks that demonstrate a commitment to widely-adopted security and data protection best practices.</p><ul><li><p><a href="https://www.aicpa-cima.com/topic/audit-assurance/audit-and-assurance-greater-than-soc-2"><u>SOC 2</u></a>: .4 points </p></li><li><p><a href="https://eur-lex.europa.eu/eli/reg/2016/679/oj/eng"><u>GDPR</u></a>: .4 points </p></li><li><p><a href="https://www.iso.org/standard/27001"><u>ISO 27001</u></a>: .4 points </p></li></ul><p><b>Data Management Practices: </b>Focuses on how data is retained and shared to minimize exposure. These criteria were chosen as they directly impact the risk of data leaks or misuse, based on common vulnerabilities we've observed in SaaS environments and our own legal/GRC team’s experience assessing third-party SaaS applications at Cloudflare.</p><ul><li><p><b>Documented data retention window:</b>  Shorter retention limits risk.</p><ul><li><p>0 day retention: .5 points</p></li><li><p>30 day retention: .4 points</p></li><li><p>60 day retention: .3 points</p></li><li><p>90 day retention: .1 point</p></li><li><p>No documented retention window: 0 points</p></li></ul></li><li><p><b>Third-party sharing:</b> No sharing means less external exposure of enterprise data. Sharing for advertising purposes means high risk of third parties mining and using the data.</p><ul><li><p>No third-party sharing: .5 points.</p></li><li><p>Sharing only for troubleshooting/support: .25 points</p></li><li><p>Sharing for other reasons like advertising or end user targeting: 0 points</p></li></ul></li></ul><p><b>Security Controls:</b> We prioritized these because they form the foundational defenses against unauthorized access, drawing from best practices that have prevented incidents in cloud services.</p><ul><li><p>MFA support: .2 points.</p></li><li><p>Role-based access: .2 points.</p></li><li><p>Session monitoring: .2 points.</p></li><li><p>TLS 1.3: .2 points.</p></li><li><p>SSO support: .2 points.</p></li></ul><p><b>Security reports and incident history:</b> Rewards transparency and deducts for recent issues. This was included to emphasize accountability, as a history of breaches or proactive transparency often indicates how seriously a provider takes security.</p><ul><li><p>Published safety framework and bug bounty: 1 point.</p><ul><li><p>To get full points the company needs to have <b>both</b> of the following: </p><ul><li><p>A publicly accessible page (e.g., security, trust, or safety) that includes a comprehensive whitepaper, framework overview, OR detailed security documentation that covers:</p><ul><li><p>Encryption in transit and at rest</p></li><li><p>Authentication and authorization mechanisms</p></li><li><p>Network or infrastructure security design</p></li></ul></li><li><p>Incident Response Transparency - Published vulnerability disclosure or bug bounty policy OR a documented incident response process and security advisory archive.</p></li></ul></li><li><p>Example: Google has a <a href="https://bughunters.google.com/"><u>bug bounty program</u></a>, a whitepaper providing an overview of their <a href="https://cloud.google.com/docs/security/overview/whitepaper"><u>security posture</u></a>, as well as a <a href="https://transparencyreport.google.com/"><u>transparency report</u></a>. </p></li></ul></li><li><p>No commitments or weak security framework with the lack of any of the above criteria. If the company only has one of the criteria above but lacks the other they will also receive no credit: 0 points.</p><ul><li><p>Example: Lovable who has a security page but seems to lack many other parts of the criteria: https://lovable.dev/security</p></li></ul></li><li><p>If there has been a material breach in the last two years. If the company has experienced a material cybersecurity incident that resulted in the unauthorized disclosure of customer data to external parties (e.g., data posted, sold, or otherwise made accessible outside the organization). Incident must be publicly acknowledged by the company through a trust center update, press release, incident notification page, or an official regulatory filing: Full deduction to 0.</p><ul><li><p>Example: <a href="https://blog.23andme.com/articles/addressing-data-security-concerns"><u>23andMe </u></a>suffered credential stuffing attack in 2023 that resulted in the exposure of user data.</p></li></ul></li></ul><p><b>Financial Stability:</b> Gauges long-term viability of the company behind the application. We added this because a company’s financial health affects its ability to invest in ongoing security and support, and reduces the risk of sudden disruptions, corner-cutting, bankruptcy or sudden sale of user data to unknown third parties.</p><ul><li><p>Public company or private with &gt;$300M raised: .8 points.</p></li><li><p>Private with &gt;$100M raised: .5 points.</p></li><li><p>Private with &lt;$100M raised: .2 point.</p></li><li><p>Recent bankruptcy/distress (e.g. recent bankruptcy filings, major layoffs tied to funding shortfalls, failure to meet debt obligations): 0 points.</p></li></ul>
    <div>
      <h3>Gen-AI Confidence Score (5.0 Points Total)</h3>
      <a href="#gen-ai-confidence-score-5-0-points-total">
        
      </a>
    </div>
    <p>This Score zooms in on AI-specific risks, like data usage in training and input vulnerabilities.</p><p><b>Regulatory Compliance,  </b><a href="https://www.iso.org/standard/42001"><b><u>ISO 42001</u></b></a><b>:</b> ISO 42001 is a new certification for AI management systems. We chose this emerging standard because it specifically addresses <a href="https://www.cloudflare.com/the-net/building-cyber-resilience/ai-data-governance/"><u>AI governance</u></a>, filling a gap in traditional certifications and signaling forward-thinking risk management.</p><ul><li><p>ISO 42001 Compliant: 1 point.</p></li><li><p>Not ISO 42001 Compliant: 0 points.</p></li></ul><p><b>Deployment Security Model:</b> Stronger access controls get higher points. Authentication not only controls access but also enables monitoring and logging. This makes it easier to detect misuse and investigate incidents. Public, unauthenticated access is a red flag for shadow IT risk.</p><ul><li><p>Authenticated web portal or key-protected API with rate limiting: 1 point.</p></li><li><p>Unprotected public access: 0 points.</p></li></ul><p><b>Model Card:</b>  A model card is a concise document that provides essential information about an AI model, similar to a nutrition label for a food product. It is crucial for AI safety and security because it offers transparency into a model's design, training data, limitations, and potential biases, enabling developers and users to understand its risks and use it responsibly. Some leading AI providers have committed to providing model cards as public documentation of safety evaluations. We included this in our rubric to encourage the industry to broadly adopt model cards as a best practice. As the practice of model cards is further developed and standardized across the industry, we hope to incorporate more fine-grained details from model cards into our own risk scores. But for now, we only include the existence (or lack thereof) of a model card in our score.</p><ul><li><p>Has its own model card: 1 point.</p></li><li><p>Uses a model with a model card: .5 points.</p></li><li><p>None: 0 points.</p></li></ul><p><b>Training on user prompts:</b> This is one of the most important components of our score.  Models that train on user prompts are very risky because users might share sensitive corporate information in user prompts. We weighted this heavily because <a href="https://www.cloudflare.com/learning/ai/how-to-secure-training-data-against-ai-data-leaks/">control over training data</a> is central to preventing unintended data exposure, a core <a href="https://www.cloudflare.com/the-net/generative-ai-zero-trust/"><u>risk in generative AI</u></a> that can lead to major incidents.</p><ul><li><p>Explicit opt-in is required for training on user prompts: 2 points.</p></li><li><p>Opt-out of training on user prompts is explicitly available to users: 1 point.</p></li><li><p>No way to opt out of training on user prompts: 0 points.</p></li></ul><p>Here's an example of these Scores applied to a few popular AI providers.  As expected, enterprise tiers typically earn higher Confidence Scores than consumer tiers of the same AI provider.</p>
<table><thead>
  <tr>
    <th><span>Company</span></th>
    <th><span>Application Score</span></th>
    <th><span>Gen AI Score</span></th>
  </tr>
  <tr>
  </tr></thead>
<tbody>
  <tr>
    <td><span>Gemini Free</span></td>
    <td><span>3.8</span></td>
    <td><span>4.0</span></td>
  </tr>
  <tr>
    <td><span>Gemini Pro</span></td>
    <td><span>3.8</span></td>
    <td><span>5.0</span></td>
  </tr>
  <tr>
    <td><span>Gemini Ultra</span></td>
    <td><span>4.1</span></td>
    <td><span>5.0</span></td>
  </tr>
  <tr>
    <td><span>Gemini Business</span></td>
    <td><span>4.7</span></td>
    <td><span>5.0</span></td>
  </tr>
  <tr>
    <td><span>Gemini Enterprise</span></td>
    <td><span>4.7</span></td>
    <td><span>5.0</span></td>
  </tr>
  <tr>
    <td></td>
    <td></td>
    <td></td>
  </tr>
  <tr>
    <td><span>OpenAI Free</span></td>
    <td><span>3.3</span></td>
    <td><span>1.0</span></td>
  </tr>
  <tr>
    <td><span>OpenAI Plus</span></td>
    <td><span>3.3</span></td>
    <td><span>3.0</span></td>
  </tr>
  <tr>
    <td><span>OpenAI Pro</span></td>
    <td><span>3.3</span></td>
    <td><span>3.0</span></td>
  </tr>
  <tr>
    <td><span>OpenAI Team</span></td>
    <td><span>4.3</span></td>
    <td><span>3.0</span></td>
  </tr>
  <tr>
    <td><span>OpenAI Enterprise</span></td>
    <td><span>4.3</span></td>
    <td><span>4.0</span></td>
  </tr>
  <tr>
    <td></td>
    <td></td>
    <td></td>
  </tr>
  <tr>
    <td><span>Anthropic Free</span></td>
    <td><span>3.9</span></td>
    <td><span>5.0</span></td>
  </tr>
  <tr>
    <td><span>Anthropic Pro</span></td>
    <td><span>3.9</span></td>
    <td><span>5.0</span></td>
  </tr>
  <tr>
    <td><span>Anthropic Max</span></td>
    <td><span>3.9</span></td>
    <td><span>5.0</span></td>
  </tr>
  <tr>
    <td><span>Anthropic Team</span></td>
    <td><span>4.9</span></td>
    <td><span>5.0</span></td>
  </tr>
  <tr>
    <td><span>Anthropic Enterprise</span></td>
    <td><span>4.9</span></td>
    <td><span>5.0</span></td>
  </tr>
</tbody></table><p><i>Note: Confidence scores are provided "as is” for informational purposes only and should not be considered a substitute for independent analysis or decision-making. All actions taken based on the scores are the sole responsibility of the user.</i></p>
    <div>
      <h2>We’re just getting started…</h2>
      <a href="#were-just-getting-started">
        
      </a>
    </div>
    <p>We're actively refining our scoring methodology. To that end, we're collaborating with a diverse group of experts in the AI ecosystem (including researchers, legal professionals, SOC teams, and more) to fine-tune our scores, optimize for transparency, accountability and extensibility. If you have insights, suggestions, or want to get involved testing new functionality, we’d love for you to <a href="https://www.cloudflare.com/lp/ai-security-user-research-program-2025"><u>express interest in our user research program</u></a>. We'd very much welcome your feedback on this scoring rubric. </p><p>Today, we’re just releasing our scoring rubric in order to solicit feedback from the community. But soon, you'll start seeing these Cloudflare Application Confidence Scores integrated into the Application Library in our SASE platform. Customers can simply click or hover over any score to reveal a detailed breakdown of the rubric and underlying components of the score. Again, if you see any issues with our scoring, please submit your feedback to <a><u>app-confidence-scores@cloudflare.com</u></a>, and our team will review it and make adjustments if appropriate. </p><p>Looking even further ahead, we plan to enable integration of these scores directly into <a href="https://developers.cloudflare.com/cloudflare-one/policies/gateway/"><u>Cloudflare Gateway</u></a> and <a href="https://developers.cloudflare.com/cloudflare-one/policies/access/"><u>Access</u></a>, allowing our customers to write policies that block or redirect traffic, apply <a href="https://developers.cloudflare.com/cloudflare-one/policies/data-loss-prevention/"><u>data loss prevention (DLP)</u></a> or <a href="https://developers.cloudflare.com/cloudflare-one/policies/browser-isolation/"><u>remote browser isolation (RBI)</u></a> or otherwise control access to sites based directly on their Cloudflare Application Confidence Score. </p><p>This is just the beginning. By prioritizing transparency in our approach, we're not only bridging a critical gap in <a href="https://www.cloudflare.com/learning/access-management/what-is-sase/">SASE capabilities</a> but also driving the industry toward stronger AI safety practices. Let us know what you think!</p><p>If you’re ready to manage risk more effectively with these Confidence Scores, <a href="https://www.cloudflare.com/products/zero-trust/plans/enterprise/?utm_medium=referral&amp;utm_source=blog&amp;utm_campaign=2025-q3-acq-gbl-connectivity-ge-ge-general-ai_week_blog"><u>reach out to Cloudflare experts for a conversation</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[Cloudflare One]]></category>
            <category><![CDATA[Cloudflare Zero Trust]]></category>
            <category><![CDATA[SASE]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[AI-SPM]]></category>
            <guid isPermaLink="false">4U0WvN8BMpHUPypHmF1Xun</guid>
            <dc:creator>Ayush Kumar</dc:creator>
            <dc:creator>Sharon Goldberg</dc:creator>
        </item>
        <item>
            <title><![CDATA[ChatGPT, Claude, & Gemini security scanning with Cloudflare CASB]]></title>
            <link>https://blog.cloudflare.com/casb-ai-integrations/</link>
            <pubDate>Tue, 26 Aug 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare CASB now scans ChatGPT, Claude, and Gemini for misconfigurations, sensitive data exposure, and compliance issues, helping organizations adopt AI with confidence.
 ]]></description>
            <content:encoded><![CDATA[ <p>Starting today, all users of <a href="https://www.cloudflare.com/zero-trust/"><u>Cloudflare One</u></a>, our <a href="https://www.cloudflare.com/learning/access-management/what-is-sase/"><u>secure access service edge (SASE)</u></a> platform, can use our API-based <a href="https://www.cloudflare.com/zero-trust/products/casb/"><u>Cloud Access Security Broker (CASB)</u></a> to assess the security posture of their generative AI (GenAI) tools: specifically, OpenAI’s <a href="https://chatgpt.com/"><u>ChatGPT</u></a>, <a href="https://www.anthropic.com/claude"><u>Claude</u></a> by Anthropic, and Google’s <a href="https://gemini.google.com/"><u>Gemini</u></a>. Organizations can connect their GenAI accounts and within minutes, start detecting misconfigurations, <a href="https://www.cloudflare.com/learning/access-management/what-is-dlp/"><u>Data Loss Prevention (DLP)</u></a> matches, data exposure and sharing, compliance risks, and more — all without having to install cumbersome software onto user devices.</p><p>As <a href="https://www.cloudflare.com/learning/ai/what-is-generative-ai/"><u>Generative AI</u></a> adoption has exploded in the enterprise, IT and Security teams need to hustle to keep themselves abreast of newly emerging <a href="https://www.cloudflare.com/the-net/generative-ai-zero-trust/"><u> security and compliance challenges</u></a> that come alongside these powerful tools. In this rapidly changing landscape, IT and Security teams need tools that help <a href="https://www.cloudflare.com/ai-security/">enable AI adoption while still protecting the security and privacy of their enterprise networks and data</a>. </p><p>Cloudflare’s API CASB and inline CASB work together to help organizations safely adopt AI tools. The API CASB integrations provide out-of-band visibility into data at rest and security posture inside popular AI tools like ChatGPT, Claude, and Gemini. At the same time, Cloudflare Gateway provides <a href="https://blog.cloudflare.com/ai-prompt-protection"><u>in-line prompt controls</u></a> and <a href="https://blog.cloudflare.com/shadow-AI-analytics"><u>Shadow AI</u></a> identification. It applies policies and DLP to traffic as it moves to these AI providers. Together, these features give organizations a unified control plane for <a href="https://blog.cloudflare.com/best-practices-sase-for-ai/">securing their use of GenAI</a>.</p>
    <div>
      <h3>What’s new</h3>
      <a href="#whats-new">
        
      </a>
    </div>
    <p>ChatGPT, Claude and Gemini are now all live in the integrations supported by <a href="https://developers.cloudflare.com/cloudflare-one/applications/scan-apps/casb-integrations/"><u>Cloudflare’s API CASB</u></a>. These integrations are available to all Cloudflare One users, account owners can easily connect their GenAI tenants, and CASB will scan for security issues across multiple domains:</p><ul><li><p><b>Agentless Connections:</b> Connect ChatGPT, Claude, and Gemini via agentless, API‑based integrations to scan posture and data risks; no endpoint software to install.</p></li><li><p><b>Posture Management:</b> Detect insecure settings and misconfigurations that can lead to data exposure or misuse.</p></li><li><p><b>DLP Detection:</b> Identify where <a href="https://developers.cloudflare.com/cloudflare-one/policies/data-loss-prevention/"><u>sensitive data</u></a> has been uploaded in chat attachments (prompts coming soon).</p></li><li><p><b>GenAI-specific Insights:</b> Surface risks associated with the unique capability of a given AI provider's toolsets.</p></li></ul><p>Admins can now answer questions like: What are our employees doing in ChatGPT? What data is being uploaded and used in Claude? Is Gemini configured correctly in Google Workspace?</p><p>Now let’s take a closer look at each integration.</p>
    <div>
      <h3>OpenAI ChatGPT</h3>
      <a href="#openai-chatgpt">
        
      </a>
    </div>
    
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6dO0h3q9modcmRPAQeiCOH/d8d54f5233e0026a63569b53cbb8d9a6/image2.png" />
          </figure><p>Cloudflare’s CASB integration with OpenAI’s ChatGPT scans for several types of insights, including:</p><ul><li><p><b>Capability Activation</b>: Highlights capabilities that are specific to ChatGPT’s feature set, like <a href="https://platform.openai.com/docs/actions/introduction"><u>actions</u></a>, <a href="https://platform.openai.com/docs/guides/tools-code-interpreter"><u>code execution</u></a>, <a href="https://help.openai.com/en/articles/9237897-chatgpt-search"><u>web access</u></a>.</p></li><li><p><b>External Exposure: </b>Finds chats and GPTs that are shared beyond the tenant, like GPTs shared publicly or listed on the <a href="https://openai.com/index/introducing-the-gpt-store/"><u>GPT Store</u></a>, and ties them back to their owners for quick triage.</p></li><li><p><b>Secrets, Keys and Invites</b>: Identifies API keys that aren’t rotated or are no longer used to maintain credential hygiene. Identifies over‑privileged or stale invites.</p></li><li><p><b>Sensitive Content (via DLP)</b>: Detects sensitive data (e.g. credential and secrets, financial / health information, source code, etc.) via <a href="https://developers.cloudflare.com/cloudflare-one/policies/data-loss-prevention/dlp-profiles/"><u>DLP profile</u></a> matches in uploaded chat attachments to enable targeted response.</p></li></ul>
    <div>
      <h3>Anthropic Claude</h3>
      <a href="#anthropic-claude">
        
      </a>
    </div>
    <p>For Claude, Cloudflare is able to provide the following out-of-band detections:</p><ul><li><p><b>Secrets, Keys and Invites:</b> Surfaces high‑risk invites and entitlement drift early so the least‑privilege access control stays tight. Spots unused API keys and rotation gaps before they turn into forgotten open doors.</p></li><li><p><b>Sensitive Content (via DLP)</b>: Monitors for <a href="https://developers.cloudflare.com/cloudflare-one/policies/data-loss-prevention/dlp-profiles/predefined-profiles/"><u>sensitive data</u></a> in uploaded files to help organizations safely enable Claude usage while maintaining compliance. Security teams get this information as quickly as CASB scans, giving them the visibility they need to help employees use Claude productively and securely with sensitive data.</p></li></ul><p>As Anthropic continues to expand Claude's API capabilities and features, Cloudflare will add corresponding security detections to match new functionality as it becomes available.</p>
    <div>
      <h3>Google Gemini</h3>
      <a href="#google-gemini">
        
      </a>
    </div>
    <p>Cloudflare’s detections for Google Gemini appear as part of our API CASB integration for Google Workspace:</p><ul><li><p><b>Identity &amp; MFA</b>: Identifies Gemini users and admins without MFA, leaving them prime targets for compromise. Imagine if an IT admin relied on Gemini daily to process corporate data, but their Google Workspace account lacked multi-factor authentication. One successful phishing email could give an attacker privileged access to Gemini and the wider Google Workspace environment — turning a minor oversight into an organization-wide breach. </p></li><li><p><b>License Hygiene</b>: Flags suspended accounts still holding Gemini or <a href="https://support.google.com/a/answer/16345165"><u>AI Ultra</u></a> licenses to cut cost and reduce exposure. An AI Ultra user has access to more powerful and riskier features, like <a href="https://deepmind.google/models/project-mariner/"><u>Project Mariner</u></a>, a research prototype that acts as an autonomous agent, capable of automating up to 10 tasks simultaneously across web browsers. An attacker can cause more damage by compromising an AI Ultra user, which is why we include this in our set of detections.</p></li></ul><p>The Gemini integration has a narrower scope because Google has structured their product and API differently than OpenAI or Anthropic. For organizations, Gemini is delivered as a <a href="https://workspace.google.com/"><u>Google Workspace</u></a> add-on. Enterprises enable Gemini features in Gmail, Docs, Sheets, and other Google Workspace apps through add-on licenses such as Gemini Enterprise or AI Ultra. Our CASB detections focus on identity, MFA, and license hygiene, rather than posture issues like public sharing or custom assistant publishing because Gemini does not yet provide those API endpoints.</p>
    <div>
      <h3>The Future of GenAI Posture Management</h3>
      <a href="#the-future-of-genai-posture-management">
        
      </a>
    </div>
    <p>Like countless other organizations, Cloudflare is adopting GenAI, on the same journey to make these environments even safer than they are today. We are excited to extend our management coverage to our customers so they can continue to innovate with GenAI. But looking ahead, we’re encouraged to see GenAI providers take concrete steps towards making security, compliance, and data privacy even more important tenets of their platforms.</p>
    <div>
      <h3>Secure GenAI beyond the reach of Inline Controls</h3>
      <a href="#secure-genai-beyond-the-reach-of-inline-controls">
        
      </a>
    </div>
    <p>Generative AI adoption brings new security requirements. Cloudflare CASB delivers out-of-band visibility across these tools, surfacing insights on top of inline controls. With posture, access, and data under control, organizations can embrace GenAI confidently and securely.</p><p><b>How to get started:</b></p><ul><li><p><b>For existing Cloudflare One customers:</b> Contact your account manager or enable the integrations directly in your dashboard today.</p></li><li><p><b>New to Cloudflare One?</b> <a href="https://dash.cloudflare.com/sign-up/zero-trust"><u>Sign up now</u></a> for 50 free seats to begin securely using Gen AI immediately. For larger deployments, request a <a href="https://www.cloudflare.com/products/zero-trust/plans/enterprise/?utm_medium=referral&amp;utm_source=blog&amp;utm_campaign=2025-q3-acq-gbl-connectivity-ge-ge-general-ai_week_blog"><u>consultation with our experts</u></a>.</p></li></ul><p>If you want to preview other new functionality and help shape our roadmap,<a href="https://www.cloudflare.com/lp/ai-security-user-research-program-2025"><u> express interest in our user research program</u></a> for <a href="https://www.cloudflare.com/learning/ai/what-is-ai-security/">AI security</a>. </p><div>
  
</div><p></p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[AI-SPM]]></category>
            <category><![CDATA[CASB]]></category>
            <category><![CDATA[Cloudflare One]]></category>
            <category><![CDATA[SASE]]></category>
            <category><![CDATA[SAAS Security]]></category>
            <guid isPermaLink="false">ZCOT8h5K8IwD7kDikj0G1</guid>
            <dc:creator>Alex Dunbrack</dc:creator>
        </item>
        <item>
            <title><![CDATA[Block unsafe prompts targeting your LLM endpoints with Firewall for AI]]></title>
            <link>https://blog.cloudflare.com/block-unsafe-llm-prompts-with-firewall-for-ai/</link>
            <pubDate>Tue, 26 Aug 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare's AI security suite now includes unsafe content moderation, integrated into the Application Security Suite via Firewall for AI.  ]]></description>
            <content:encoded><![CDATA[ <p>Security teams are racing to <a href="https://www.cloudflare.com/the-net/vulnerable-llm-ai/"><u>secure a new attack surface</u></a>: AI-powered applications. From chatbots to search assistants, LLMs are already shaping customer experience, but they also open the door to new risks. A single malicious prompt can exfiltrate sensitive data, <a href="https://www.cloudflare.com/learning/ai/data-poisoning/"><u>poison a model</u></a>, or inject toxic content into customer-facing interactions, undermining user trust. Without guardrails, even the best-trained model can be turned against the business.</p><p>Today, as part of AI Week, we’re expanding our <a href="https://www.cloudflare.com/ai-security/">AI security offerings</a> by introducing unsafe content moderation, now integrated directly into Cloudflare <a href="https://developers.cloudflare.com/waf/detections/firewall-for-ai/"><u>Firewall for AI</u></a>. Built with Llama, this new feature allows customers to leverage their existing Firewall for AI engine for unified detection, analytics, and topic enforcement, providing real-time protection for <a href="https://www.cloudflare.com/learning/ai/what-is-large-language-model/"><u>Large Language Models (LLMs)</u></a> at the network level. Now with just a few clicks, security and application teams can detect and block harmful prompts or topics at the edge — eliminating the need to modify application code or infrastructure.

This feature is immediately available to current Firewall for AI users. Those not yet onboarded can contact their account team to participate in the beta program.</p>
    <div>
      <h2>AI protection in application security</h2>
      <a href="#ai-protection-in-application-security">
        
      </a>
    </div>
    <p>Cloudflare's Firewall for AI <a href="https://blog.cloudflare.com/best-practices-sase-for-ai/">protects user-facing LLM applications</a> from abuse and data leaks, addressing several of the <a href="https://www.cloudflare.com/learning/ai/owasp-top-10-risks-for-llms/"><u>OWASP Top 10 LLM risks</u></a> such as prompt injection, PII disclosure, and unbound consumption. It also extends protection to other risks such as unsafe or harmful content.</p><p>Unlike built-in controls that vary between model providers, Firewall for AI is model-agnostic. It sits in front of any model you choose, whether it’s from a third party like OpenAI or Gemini, one you run in-house, or a custom model you have built, and applies the same consistent protections.</p><p>Just like our origin-agnostic <a href="https://www.cloudflare.com/application-services/#application-services-case-products"><u>Application Security suite</u></a>, Firewall for AI enforces policies at scale across all your models, creating a unified security layer. That means you can define guardrails once and apply them everywhere. For example, a financial services company might require its LLM to only respond to finance-related questions, while blocking prompts about unrelated or sensitive topics, enforced consistently across every model in use.</p>
    <div>
      <h2>Unsafe content moderation protects businesses and users</h2>
      <a href="#unsafe-content-moderation-protects-businesses-and-users">
        
      </a>
    </div>
    <p>Effective AI moderation is more than blocking “bad words”, it’s about setting boundaries that protect users, meeting legal obligations, and preserving brand integrity, without over-moderating in ways that silence important voices.</p><p>Because LLMs cannot be fully scripted, their interactions are inherently unpredictable. This flexibility enables rich user experiences but also opens the door to abuse.</p><p>Key risks from unsafe prompts include misinformation, biased or offensive content, and model poisoning, where repeated harmful prompts degrade the quality and safety of future outputs. Blocking these prompts aligns with the OWASP Top 10 for LLMs, preventing both immediate misuse and long-term degradation.</p><p>One example of this is<a href="https://www.theverge.com/2016/3/24/11297050/tay-microsoft-chatbot-racist"> <b><u>Microsoft’s Tay chatbot</u></b></a>. Trolls deliberately submitted toxic, racist, and offensive prompts, which Tay quickly began repeating. The failure was not only in Tay’s responses; it was in the lack of moderation on the inputs it accepted.</p>
    <div>
      <h2>Detecting unsafe prompts before reaching the model</h2>
      <a href="#detecting-unsafe-prompts-before-reaching-the-model">
        
      </a>
    </div>
    <p>Cloudflare has integrated <a href="https://huggingface.co/meta-llama/Llama-Guard-3-8B"><u>Llama Guard</u></a> directly into Firewall for AI. This brings AI input moderation into the same rules engine our customers already use to protect their applications. It uses the same approach that we created for developers building with AI in our <a href="https://blog.cloudflare.com/guardrails-in-ai-gateway/"><u>AI Gateway</u></a> product.</p><p>Llama Guard analyzes prompts in real time and flags them across multiple safety categories, including hate, violence, sexual content, criminal planning, self-harm, and more.</p><p>With this integration, Firewall for AI not only <a href="https://blog.cloudflare.com/take-control-of-public-ai-application-security-with-cloudflare-firewall-for-ai/#discovering-llm-powered-applications"><u>discovers LLM traffic</u></a> endpoints automatically, but also enables security and AI teams to take immediate action. Unsafe prompts can be blocked before they reach the model, while flagged content can be logged or reviewed for oversight and tuning. Content safety checks can also be combined with other Application Security protections, such as <a href="https://www.cloudflare.com/application-services/products/bot-management/"><u>Bot Management</u> </a>and <a href="https://www.cloudflare.com/application-services/products/rate-limiting/"><u>Rate Limiting</u></a>, to create layered defenses when protecting your model.</p><p>The result is a single, edge-native policy layer that enforces guardrails before unsafe prompts ever reach your infrastructure — without needing complex integrations.</p>
    <div>
      <h2>How it works under the hood</h2>
      <a href="#how-it-works-under-the-hood">
        
      </a>
    </div>
    <p>Before diving into the architecture of Firewall for AI engine and how it fits within our previously mentioned module to detect <a href="https://blog.cloudflare.com/take-control-of-public-ai-application-security-with-cloudflare-firewall-for-ai/#using-workers-ai-to-deploy-presidio"><u>PII in the prompts</u></a>, let’s start with how we detect unsafe topics.</p>
    <div>
      <h3>Detection of unsafe topics</h3>
      <a href="#detection-of-unsafe-topics">
        
      </a>
    </div>
    <p>A key challenge in building safety guardrails is balancing a good detection with model helpfulness. If detection is too broad, it can prevent a model from answering legitimate user questions, hurting its utility. This is especially difficult for topic detection because of the ambiguity and dynamic nature of human language, where context is fundamental to meaning. </p><p>Simple approaches like keyword blocklists are interesting for precise subjects — but insufficient. They are easily bypassed and fail to understand the context in which words are used, leading to poor recall. Older probabilistic models such as <a href="https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation"><u>Latent Dirichlet Allocation (LDA)</u></a> were an improvement, but did not properly account for word ordering and other contextual nuances. 

Recent advancements in LLMs introduced a new paradigm. Their ability to perform zero-shot or few-shot classification is uniquely suited for the task of topic detection. For this reason, we chose <a href="https://huggingface.co/meta-llama/Llama-Guard-3-8B"><u>Llama Guard 3</u></a>, an open-source model based on the Llama architecture that is specifically fine-tuned for content safety classification. When it analyzes a prompt, it answers whether the text is safe or unsafe, and provides a specific category. We are showing the default categories, as listed <a href="http://developers.cloudflare.com/ruleset-engine/rules-language/fields/reference/cf.llm.prompt.unsafe_topic_categories/"><u>here</u></a>. Because Llama 3 has a fixed knowledge cutoff, certain categories — like defamation or elections — are time-sensitive. As a result, the model may not fully capture events or context that emerged after it was trained, and that’s important to keep in mind when relying on it.</p><p>For now, we cover the 13 default categories. We plan to expand coverage in the future, leveraging the model’s zero-shot capabilities.</p>
    <div>
      <h3>A scalable architecture for future detections</h3>
      <a href="#a-scalable-architecture-for-future-detections">
        
      </a>
    </div>
    <p>We designed Firewall for AI to scale without adding noticeable latency, including Llama Guard, and this remains true even as we add new detection models.</p><p>To achieve this, we built a new asynchronous architecture. When a request is sent to an application protected by Firewall for AI, a Cloudflare Worker makes parallel, non-blocking requests to our different detection modules — one for PII, one for unsafe topics, and others as we add them. </p><p>Thanks to the Cloudflare network, this design scales to handle high request volumes out of the box, and latency does not increase as we add new detections. It will only be bounded by the slowest model used. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4Y2gTP6teVR2263UIEWHc9/9a31fb394cee6c437c1d4af6f71d867c/image3.png" />
          </figure><p>We optimize to keep the model utility at its maximum while keeping the guardrail detection broad enough.</p><p>Llama Guard is a rather large model, so running it at scale with minimal latency is a challenge. We deploy it on <a href="https://www.cloudflare.com/developer-platform/products/workers-ai/"><u>Workers AI</u></a>, leveraging our large fleet of high performance GPUs. This infrastructure ensures we can offer fast, reliable inference throughout our network.</p><p>To ensure the system remains fast and reliable as adoption grows, we ran extensive load tests simulating the requests per second (RPS) we anticipate, using a wide range of prompt sizes to prepare for real-world traffic. To handle this, the number of model instances deployed on our network scales automatically with the load. We employ concurrency to minimize latency and optimize for hardware utilization. We also enforce a hard 2-second threshold for each analysis; if this time limit is reached, we fall back to any detections already completed, ensuring your application's requests latency is never further impacted.</p>
    <div>
      <h3>From detection to security rules enforcement</h3>
      <a href="#from-detection-to-security-rules-enforcement">
        
      </a>
    </div>
    <p>Firewall for AI follows the same familiar pattern as other Application Security features like Bot Management and WAF Attack Score, making it easy to adopt.</p><p>Once enabled, the <a href="https://developers.cloudflare.com/waf/detections/firewall-for-ai/#fields"><u>new fields</u></a> appear in <a href="https://developers.cloudflare.com/waf/analytics/security-analytics/"><u>Security Analytics</u></a> and expanded logs. From there, you can filter by unsafe topics, track trends over time, and drill into the results of individual requests to see all detection outcomes, for example: did we detect unsafe topics, and what are the categories. The request body itself (the prompt text) is not stored or exposed; only the results of the analysis are logged.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/722JxyLvT6DFQxFpQhHMYP/3f1a6aa8ef1dafe4ad1a8277578fd7ae/image2.png" />
          </figure><p>After reviewing the analytics, you can enforce unsafe topic moderation by creating rules to log or block based on prompt categories in <a href="https://developers.cloudflare.com/waf/custom-rules/"><u>Custom rules</u></a>.</p><p>For example, you might log prompts flagged as sexual content or hate speech for review. </p><p>You can use this expression: 
<code>If (any(cf.llm.prompt.unsafe_topic_categories[*] in {"S10" "S12"})) then Log</code>

Or deploy the rule with the categories field in the dashboard as in the below screenshot.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2CUsVjjpCEqv2UQMU6cMmt/5307235338c1b58856c0685585347537/image4.png" />
          </figure><p>You can also take a broader approach by blocking all unsafe prompts outright:
<code>If (cf.llm.prompt.unsafe_topic_detected)then Block</code></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3uRT9YlRlRPsL5bNyBFA3i/54eb171ecb48aaecc7876b972789bf15/image5.png" />
          </figure><p>These rules are applied automatically to all discovered HTTP requests containing prompts, ensuring guardrails are enforced consistently across your AI traffic.</p>
    <div>
      <h2>What’s Next</h2>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>In the coming weeks, Firewall for AI will expand to detect prompt injection and jailbreak attempts. We are also exploring how to add more visibility in the analytics and logs, so teams can better validate detection results. A major part of our roadmap is adding model response handling, giving you control over not only what goes into the LLM but also what comes out. Additional abuse controls, such as rate limiting on tokens and support for more safety categories, are also on the way.</p><p>Firewall for AI is available in beta today. If you’re new to Cloudflare and want to explore how to implement these AI protections, <a href="https://www.cloudflare.com/plans/enterprise/contact/?utm_medium=referral&amp;utm_source=blog&amp;utm_campaign=2025-q3-acq-gbl-connectivity-ge-ge-general-ai_week_blog"><u>reach out for a consultation</u></a>. If you’re already with Cloudflare, contact your account team to get access and start testing with real traffic.</p><p>Cloudflare is also opening up a user research program focused on <a href="https://www.cloudflare.com/learning/ai/what-is-ai-security/">AI security</a>. If you are curious about previews of new functionality or want to help shape our roadmap, <a href="https://www.cloudflare.com/lp/ai-security-user-research-program-2025"><u>express your interest here</u></a>.</p><div>
  
</div><p></p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[LLM]]></category>
            <category><![CDATA[WAF]]></category>
            <category><![CDATA[AI]]></category>
            <guid isPermaLink="false">59hk6A3nH3YcLMjXhYnNof</guid>
            <dc:creator>Radwa Radwan</dc:creator>
            <dc:creator>Mathias Deschamps</dc:creator>
        </item>
    </channel>
</rss>