
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Wed, 08 Apr 2026 00:41:29 GMT</lastBuildDate>
        <item>
            <title><![CDATA[The most-seen UI on the Internet? Redesigning Turnstile and Challenge Pages]]></title>
            <link>https://blog.cloudflare.com/the-most-seen-ui-on-the-internet-redesigning-turnstile-and-challenge-pages/</link>
            <pubDate>Fri, 27 Feb 2026 06:00:00 GMT</pubDate>
            <description><![CDATA[ We serve 7.6 billion challenges daily. Here’s how we used research, AAA accessibility standards, and a unified architecture to redesign the Internet’s most-seen user interface. ]]></description>
            <content:encoded><![CDATA[ <p>You've seen it. Maybe you didn't register it consciously, but you've seen it. That little widget asking you to verify you're human. That full-page security check before accessing a website. If you've spent any time on the Internet, you've encountered Cloudflare's Turnstile widget or Challenge Pages — likely more times than you can count.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5YaxxmA9nz7AufmcJmhagL/0db6b65ec7456bc8091affc6beaf3ec2/Image_1_-_Turnstile.png" />
          </figure><p><sup><i>The Turnstile widget – a familiar sight across millions of websites</i></sup></p><p>When we say that a large portion of the Internet sits behind Cloudflare, we mean it. Our Turnstile widget and Challenge Pages are served 7.67 billion times every single day. That's not a typo. Billions. This might just be the most-seen user interface on the Internet.</p><p>And that comes with enormous responsibility.</p><p>Designing a product with billions of eyeballs on it isn't just challenging — it requires a fundamentally different approach. Every pixel, every word, every interaction has to work for someone's grandmother in rural Japan, a teenager in São Paulo, a visually impaired developer in Berlin, and a busy executive in Lagos. All at the same time. In moments of frustration.</p><p>Today we’re sharing the story of how we redesigned Turnstile and Challenge Pages. It's a story told in three parts, by three of us: the design process and research that shaped our decisions (Leo), the engineering challenge of deploying changes at unprecedented scale (Ana), and the measurable impact on billions of users (Marina).</p><p>Let's start with how we approached the problem from a design perspective.</p>
    <div>
      <h2>Part 1: The design process</h2>
      <a href="#part-1-the-design-process">
        
      </a>
    </div>
    
    <div>
      <h3>The problem</h3>
      <a href="#the-problem">
        
      </a>
    </div>
    <p>Let's be honest: nobody likes being asked to prove they're human. You know you're human. I know I'm human. The only one who doesn't seem convinced is that little widget standing between you and the website you're trying to access. At best, it's a minor inconvenience. At worst? You've probably wanted to throw your computer out the window in a fit of rage. We've all been there. And no one would blame you.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/640zjNaqDcNdJy4mYN6H14/ce184df68c9612d77f0767726bf27822/2.png" />
          </figure><p><sup><i>Turnstile integrated into a login flow</i></sup></p><p>As the world warms up to what appears to be an inevitable AI revolution, the need for security verification is only increasing. At Cloudflare, we've seen a significant rise in bot attacks — and in response, organizations are investing more heavily in security measures. That means more challenges being issued to more end users, more often.</p><p>The numbers tell the story:</p><p>2023: 2.14B daily</p><p>2024: 3B daily</p><p>2025: 5.35B daily</p><p>That's a 58.1% average increase in security checks, year over year. More security checks mean more opportunities for end user frustration. The more companies integrate these verification systems to protect themselves and their customers, the higher the chance that someone, somewhere, is going to have a bad experience.</p><p>We knew it was time to take a hard look at our flagship products and ask ourselves: Are we doing right by the billions of people who encounter these experiences? Are we fulfilling our mission to build a better Internet — not just a more secure one, but a more human one?</p><p>The answer, we discovered, was: we could do better.</p>
    <div>
      <h3>The design audit</h3>
      <a href="#the-design-audit">
        
      </a>
    </div>
    <p>Before redesigning anything, we needed to understand what we were working with. We started by conducting a comprehensive audit of every state, every error message, and every interaction across both Turnstile and Challenge Pages.</p><p>What we found wasn't the best.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1g1exDgeRH9QlApBXItcfL/fb0051d1dabaa6c91cf976ef64793502/3.png" />
          </figure><p><sup><i>The state of inconsistency in the Turnstile widget. Multiple states with no unified approach</i></sup></p><p>The inconsistencies were glaring. We had no unified approach across the multitude of different error scenarios. Some messages were overly verbose and technical ("Your device clock is set to a wrong time or this challenge page was accidentally cached by an intermediary and is no longer available"). Others were too vague to be helpful ("Timed out"). The visual language varied wildly — different layouts, different hierarchies, different tones of voice.</p><p>We also examined the feedback we'd received online. Social media, support tickets, community forums — we read it all. The frustration was palpable, and much of it was avoidable.</p><p>Take our feedback mechanism, for example. We offered users feedback options like "The widget sometimes fails" versus "The widget fails all the time." But what's the difference, really? And how were they supposed to know how often it failed? We were asking users to interpret ambiguous options during their most frustrated moments. The more we left open to interpretation, the less useful the feedback became — and the more frustration we saw across social channels.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5xKRSM0FfDZikEECwgHoof/ad55208973698cb237444c21d384aff8/4.png" />
          </figure><p><sup><i>The previous feedback screen: "The widget sometimes fails" vs "The widget fails all the time" — what's the difference?</i></sup></p><p>Our Challenge Pages — the full-page security blocks that appear when we detect suspicious activity or when site owners have heightened security settings — had similar issues. Some states were confusing. Others used too much technical jargon. Many failed to provide actionable guidance when users needed it most.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5JUxHjJ4VG13F7QfLONJEQ/fa443e5dd24f10d0c256864cd3f42734/5.png" />
          </figure><p><sup><i>The state of inconsistency on the Challenge pages. Multiple states with no unified approach</i></sup></p><p>The audit was humbling. But it gave us a clear picture of where we needed to focus.</p>
    <div>
      <h2>Mapping the user journey</h2>
      <a href="#mapping-the-user-journey">
        
      </a>
    </div>
    <p>To design better experiences, we first needed to understand every possible path a user could take. What was the happy path? Was there even one? And what were the unhappy paths that led to escalating frustration?</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1oTbFZoRu7guIxzoe64qcm/4f579fe2e70d6225a51504b3de10030f/6.png" />
          </figure><p><sup><i>Mapping the complete user journey — from initial encounter through error scenarios, with sentiment tracking</i></sup></p><p>This was a true cross-functional effort. We worked closely with engineers like Ana who knew the technical ins and outs of every edge case, and with Marina on the product side who understood not just how the product worked, but how users felt about it — the love and the hate we'd see online.</p><p>We have some of the smartest people working on bot protection at Cloudflare. But intelligence and clarity aren't the same thing. There's a delicate balance between technical complexity and user simplicity. Only when these two dance together successfully can we communicate information in a way that actually makes sense to people.</p><p>And here's the thing: the messaging has to work for everyone. A person of any age. Any mental or physical capability. Any cultural background. Any level of technical sophistication. That's what designing at scale really means — you can’t ignore edge cases, since, at such scale, they are no longer edge cases.</p>
    <div>
      <h2>Establishing a unified information architecture</h2>
      <a href="#establishing-a-unified-information-architecture">
        
      </a>
    </div>
    <p>One of the most influential books in UX design is Steve Krug's <a href="https://sensible.com/dont-make-me-think/"><u>Don't Make Me Think</u></a>. The core principle is simple: every moment a user spends trying to interpret, understand, or decode your interface is a moment of friction. And friction, especially in moments of frustration, leads to abandonment.</p><p>Our audit revealed that we were asking users to think far too much. Different pieces of information occupied the same space in the UI across different states. There was no consistent visual hierarchy. Users encountering an error state in Turnstile would find information in a completely different place than they would on a Challenge Page.</p><p>We made a fundamental decision: <b>one information architecture to rule them all</b>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3runU0ihKhNpgdw3LxNZUv/aa4bd76efb5847fde0659bccdae7242d/7.png" />
          </figure><p><sup><i>Visual diagram displaying a unified information architecture with a consistent structure across Turnstile widget and Challenge pages</i></sup></p><p>Both Turnstile and Challenge Pages would now follow the same structural pattern. The same visual hierarchy. The same placement for actions, for explanatory text, for links to documentation.</p><p>Did this constrain our design options? Absolutely. We had to say no to a lot of creative ideas that didn't fit the framework. But constraints aren't the enemy of good design — they're often its best friend. By limiting our options, we could go deeper on the details that actually mattered.</p><p>For users, the benefit is profound: they don't need to re-learn what each piece of the UI means. Error states look consistent. Help links are always in the same place. Once you understand one state, you understand them all. That's cognitive load reduced to a minimum — exactly where it should be during a security verification.</p>
    <div>
      <h2>What user research taught us</h2>
      <a href="#what-user-research-taught-us">
        
      </a>
    </div>
    <p>How do you keep yourself accountable when redesigning something that billions of people see? You test. A lot.</p><p>We recruited 8 participants across 8 different countries, deliberately seeking diversity in age, digital savviness, and cultural background. We weren't looking for tech-savvy early adopters — we wanted to understand how the redesign would work for everyone.</p><p>Our approach was rigorous: participants saw both the current experience and proposed changes, without knowing which was "old" or "new." We counterbalanced positioning to eliminate bias. And we did not just test our new ideas, but also challenged our assumptions about what needed changing in the first place.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/59mmLHihbM9TewXmlYQwbO/e5db88efca948de1b31e9dc499195eb8/8.png" />
          </figure><p><sup><i>Two different versions of a Turnstile being tested in an A/B test</i></sup></p>
    <div>
      <h3>Some things didn’t need fixing</h3>
      <a href="#some-things-didnt-need-fixing">
        
      </a>
    </div>
    <p>One hypothesis: should we align with competitors? Most CAPTCHA providers show "I am human" across all states. We use distinct content — "Verify you are human," then "Verifying...," then "Success!"</p><p>Were we overcomplicating things? We tested it head-to-head.</p><p>Our approach won decisively. For the interactivity state, "Verify you are human" scored 5 out of 8 points versus just 3 for "I am human." For the verifying state, it was even more dramatic — 7.5 versus 0.5. Users wanted to know what was happening, not just be told what they were.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6ke1kO0i7EZxZm6voQBpyn/f489bef9b66d1221aa89adb5746559b7/9.png" />
          </figure><p><sup><i>User testing results: users strongly favored our approach over the competitor-style design</i></sup></p><p>This experiment didn't ship as a feature, but it was invaluable. It gave us confidence we weren't just being different for the sake of it. Some things were already right.</p>
    <div>
      <h3>But these needed to change</h3>
      <a href="#but-these-needed-to-change">
        
      </a>
    </div>
    <p>The research surfaced four areas where we were failing users:</p><p><b>Help, not bureaucracy</b>. When users encountered errors, we offered "Send Feedback." In testing, they were baffled. "Who am I sending this to? The website? Cloudflare? My ISP?" More importantly, we discovered something fundamental: at the moment of maximum frustration, people don't want to file a report — they want to fix the problem. We replaced "Send Feedback" with "Troubleshoot" — a single word that promises action rather than bureaucracy.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2jN2reUR55qCbssCDFTZfB/fb5396ec853ee549ebfec5d0d94b901f/10.png" />
          </figure><p><sup><i>The problematic "Send Feedback" prompt: users didn't know who they were sending feedback to</i></sup></p><p><b>Attention, not alarm</b>. We'd used red backgrounds liberally for errors. The reaction in testing was visceral — participants felt they had failed, felt powerless. Even for simple issues that would resolve with a retry, users assumed the worst and gave up. Red at full saturation wasn't communicating "Here's something to address." It was communicating "You have failed, and there's nothing you can do." The fix: red only for icons, never for text or backgrounds.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5seE6Xcrj9lvSpBYDEkk6N/7f0c1c17fd86b05397d35b685b0addfb/11.png" />
          </figure><p><sup><i>The evolution: from the states with unclear error state description in red to much clearer and concise error communication in neutral-color text.</i></sup></p><p><b>Scannable, not verbose</b>. We'd tried to be thorough, explaining errors in technical detail. It backfired. Non-technical users found it alienating. Technical users didn't need it. Everyone was trying to read it in the tiny real estate of a widget. The lesson: less is more, especially in constrained spaces during stressful moments.</p><p><b>Accessible to everyone</b>. Our audit revealed 10px fonts in some states. Grey text that technically met AA (at least 4.5:1 for normal text and 3:1 for large text) compliance but was difficult to read in practice. "Technically compliant" isn't good enough when you're serving the entire Internet.</p><p>We set a clear goal: to meet the <a href="https://www.w3.org/TR/WCAG22/"><u>WCAG 2.2 AAA</u></a> standard— the highest and most stringent level of web accessibility compliance, designed to make content accessible to the broadest range of users, including those with severe disabilities. Throughout the redesign, when visual consistency conflicted with readability, readability won. Every time.</p><p>This extended beyond vision. We designed for screen reader users, keyboard-only navigators, and people with color vision variations — going beyond what automated compliance tools can catch.</p><p>And accessibility isn't just about impairments — it's about language. What fits in English, overflows in German. What's concise in Spanish is ambiguous in Japanese. Supporting over 40 languages forced us to radically simplify. The same "Unable to connect to website / Troubleshoot" pattern now works across English, Bulgarian, Danish, German, Greek, Japanese, Indonesian, Russian, Slovak, Slovenian, Serbian, Filipino, and many more.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6e4pvgMUS4BUXsPqi1qV6l/b6ffdc0d5f1e8e90394169db7162d10c/12.png" />
          </figure><p><sup><i>The redesigned error state across 12 languages — consistent layout despite varying text lengths </i></sup></p>
    <div>
      <h2>Final redesign</h2>
      <a href="#final-redesign">
        
      </a>
    </div>
    <p>So what did we actually ship?</p><p>First, let's talk about what we didn't change. The happy path — "Verify you are human" → "Verifying..." → "Success!" — tested exceptionally well. Users understood what was happening at each stage. The distinct content for each state, which we'd worried might be overcomplicating things, was actually our competitive advantage.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2R4QJ04uz9r1TVZjuqsHG9/61c1023eaa105b4841456258f3370220/13.png" />
          </figure><p><i><sup> The happy path: Verify you are human → Verifying → Success! These states tested well and remained largely unchanged</sup></i></p><p>But for the states that needed work, we made significant changes guided by everything we learned.</p>
    <div>
      <h3>Simplified, scannable content</h3>
      <a href="#simplified-scannable-content">
        
      </a>
    </div>
    <p>We radically reduced the amount of text in error states. Instead of verbose explanations like "Your device clock is set to a wrong time or this challenge page was accidentally cached by an intermediary and is no longer available," we now show:</p><ol><li><p>A clear, simple state name (e.g., "Incorrect device time")</p></li><li><p>A prominent "Troubleshoot" link</p></li></ol><p>That's it. The detailed guidance now lives in a dedicated modal screen that opens when users need it — giving them room to actually read and follow troubleshooting steps.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4ZYjlJgw6DOiTuJBFXpewn/5d714c3a19723dfe9fa9802d0d5926b8/14.png" />
          </figure><p><sup><i>The troubleshooting modal: detailed guidance when users need it, without cluttering the widget</i></sup></p><p>The troubleshooting modal provides context ("This error occurs when your device's clock or calendar is inaccurate. To complete this website’s security verification process, your device must be set to the correct date and time in your time zone."), numbered steps to try, links to documentation, and — only after the user has tried to resolve the issue — an option to submit feedback to Cloudflare. Help first, feedback second.</p>
    <div>
      <h3>AAA accessibility compliance</h3>
      <a href="#aaa-accessibility-compliance">
        
      </a>
    </div>
    <p>Every state now meets WCAG 2.2 AAA standards for contrast and readability. Font sizes have established minimums. Interactive elements are clearly focusable and properly announced by screen readers.</p>
    <div>
      <h3>Unified experience across Turnstile and Challenge pages</h3>
      <a href="#unified-experience-across-turnstile-and-challenge-pages">
        
      </a>
    </div>
    <p>Whether users encounter the compact Turnstile widget or a full Challenge Page, the information architecture is now consistent. Same hierarchy. Same placement. Same mental model.</p><p>Challenge Pages now follow a clean structure: the website name and favicon at the top, a clear status message (like "Verification successful" or "Your browser is out of date"), and actionable guidance below. No more walls of orange or red text. No more technical jargon without context.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4PuWePTOaLihpfqm2iimJW/e34c4a009c36524a6d72c15ae0f78d00/15.png" />
          </figure><p><sup><i>Re-designed Challenge page states with clear troubleshooting instructions.</i></sup></p>
    <div>
      <h3>Validated across languages</h3>
      <a href="#validated-across-languages">
        
      </a>
    </div>
    <p>Every piece of content was tested in over 40 supported languages. Our process involved three layers of validation:</p><ol><li><p>Initial design review by the design team</p></li><li><p>Professional translation by our qualified vendor</p></li><li><p>Final review by native-speaking Cloudflare employees</p></li></ol><p>This wasn't just about translation accuracy — it was about ensuring the visual design held up when content length varied dramatically between languages.</p>
    <div>
      <h3>The complete picture</h3>
      <a href="#the-complete-picture">
        
      </a>
    </div>
    <p>The result is a security verification experience that's clearer, more accessible, less frustrating, and — crucially — just as secure. We didn't compromise on protection to improve the experience. We proved that good design and strong security aren't in conflict.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5t6FRRzLamGaTbEiZqVpnf/92b688679d1c8265ba3c6fd4159061bf/16.png" />
          </figure><p><sup><i>Re-designed Turnstile widgets on the left and a re-designed Challenge page on the right</i></sup></p><p>But designing the experience was only half the battle. Shipping it to billions of users? That's where Ana comes in.</p>
    <div>
      <h2>Part 2: Shipping to billions</h2>
      <a href="#part-2-shipping-to-billions">
        
      </a>
    </div>
    
    <div>
      <h4><b>Beyond centering a div</b></h4>
      <a href="#beyond-centering-a-div">
        
      </a>
    </div>
    <p>Some may say the hardest part of being a Frontend Engineer is centering a div. In reality, the real challenge often lies much deeper, especially when working close to the platform primitives. Building a critical piece of Internet infrastructure using native APIs forces you to think differently about UI development, tradeoffs, and long-term maintainability.</p><p>In our case, we use Rust to handle the UI for both the Turnstile widget and the Challenge page. This decision brought clear benefits in terms of safety and consistency across platforms, but it also increased frontend complexity. Many of us are used to the ergonomics of modern frameworks like React, where common UI interactions come almost for free. Working with Rust meant reimplementing even simple interactions using lower level constructs like <i>document.getElementById</i>, <i>createElement</i>, and <i>appendChild</i>.</p><p>On top of that, compile times and strict checks naturally slowed down rapid UI iteration compared to JavaScript based frameworks. Debugging was also more involved, as the tooling ecosystem is still evolving. These constraints pushed us to be more deliberate, more thoughtful, and ultimately more disciplined in how we approached UI development.</p>
    <div>
      <h4><b>Small visual changes, big global impact</b></h4>
      <a href="#small-visual-changes-big-global-impact">
        
      </a>
    </div>
    <p>What initially looked like small visual tweaks such as padding adjustments or alignment changes quickly revealed a much bigger challenge: internationalization.</p><p>Once translations were available, we had to ensure that content remained readable and usable across 38 languages and 16 different UI states. Text length variability alone required careful design decisions. Some translations can be 30 to 300 percent longer than English. A short English string like “Stuck?” becomes “Tidak bisa melanjutkan?” in Indonesian or “Es geht nicht weiter?” in German, dramatically changing layout requirements.</p><p>Right-to-left language support added another layer of complexity. Supporting Arabic, Persian or Farsi, and Hebrew meant more than flipping text direction. Entire layouts had to be mirrored, including alignment, navigation patterns, directional icons, and animation flows. Many of these elements are implicitly designed with left-to-right assumptions, so we had to revisit those decisions and make them truly bidirectional.</p><p>Ordered lists also required special care. Not every culture uses the Western 1, 2, 3 numbering system, and hardcoding numeric sequences can make interfaces feel foreign or incorrect. We leaned on locale-aware numbering and fully translatable list formats to ensure ordering felt natural and culturally appropriate in every language.</p>
    <div>
      <h4><b>Building confidence through testing</b></h4>
      <a href="#building-confidence-through-testing">
        
      </a>
    </div>
    <p>As we started listing action points in feedback reports, correctness became even more critical. Every action needed to render properly, trigger the right flow, and behave consistently across states, languages, and edge cases.</p><p>To get there, we invested heavily in testing. Unit tests helped us validate logic in isolation, while end-to-end tests ensured that new states and languages worked as expected in real scenarios. This testing foundation gave us confidence to iterate safely, prevented regressions, and ensured that feedback reports remained reliable and actionable for users.</p>
    <div>
      <h4><b>The outcome</b></h4>
      <a href="#the-outcome">
        
      </a>
    </div>
    <p>What began as a set of technical constraints turned into an opportunity to build a more robust, inclusive, and well-tested UI system. Working with fewer abstractions and closer to the browser primitives forced us to rethink assumptions, improve our internationalization strategy, and raise the overall quality bar.</p><p>The result is not just a solution that works, but one we trust. And that trust is what allows us to keep improving, even when centering a div turns out to be the easy part.</p>
    <div>
      <h2>Part 3: The impact</h2>
      <a href="#part-3-the-impact">
        
      </a>
    </div>
    <p>Designing for billions of people is a responsibility we take seriously. At this scale, it is essential to leverage measurable data to tell us the real impact of our design choices. As we prepare to roll out these changes, we are focusing on <b>five key metrics</b> that will tell us if we’ve truly succeeded in making the Internet’s most-seen UI more human.</p>
    <div>
      <h4><b>1. Challenge Completion Rate</b></h4>
      <a href="#1-challenge-completion-rate">
        
      </a>
    </div>
    <p>Our primary north star is the <b>Challenge Solve Rate: </b>the percentage of issued challenges that are successfully completed. By moving away from technical jargon like "intermediary caching" and toward simple, actionable labels like "Incorrect device time," we expect a significant uptick in CSR. A higher CSR doesn't mean we're being easier on bots; it means we’re removing the hurdles that were accidentally tripping up legitimate human users.</p>
    <div>
      <h4><b>2. Time to Complete</b></h4>
      <a href="#2-time-to-complete">
        
      </a>
    </div>
    <p>Every second a user spends on a challenge page is a second they aren't getting the information that they need. Our research showed that users were often paralyzed by choice when seeing a wall of red text. With our new scannable, neutral-color design, we are tracking <b>Time to Complete</b> to ensure users can identify and resolve issues in seconds rather than minutes.</p>
    <div>
      <h4><b>3. Abandonment Rate Changes</b></h4>
      <a href="#3-abandonment-rate-changes">
        
      </a>
    </div>
    <p>In the past, our liberal use of "saturated red" caused a visceral reaction: users felt they had failed and simply gave up. By reserving red only for icons and using a unified architecture, we aim to reduce Abandonment Rates. We want users to feel empowered to click Troubleshoot rather than feeling powerless and clicking away.</p>
    <div>
      <h4><b>4. Support Ticket Volume</b></h4>
      <a href="#4-support-ticket-volume">
        
      </a>
    </div>
    <p>One of the bigger shifts from a product perspective is our new Troubleshooting Modal. By providing clear, numbered steps directly within the widget, we are building self-service support into the UI. We expect this to result in a measurable decrease in support ticket volume for both our customers and our own internal teams.</p>
    <div>
      <h4><b>5. Social Sentiment</b></h4>
      <a href="#5-social-sentiment">
        
      </a>
    </div>
    <p>We know that security challenges are rarely loved, but they shouldn't be hated because they are confusing. We are monitoring <b>Social Sentiment</b> across community forums, feedback reports, and social channels to see if the conversation shifts from "this widget is broken" to "I had an issue, but I fixed it".</p><p>As a Product Manager, my goal is often invisible security — the best challenge is the one the user never sees. But when a challenge <i>must</i> be seen, it should be an assistant, not a bouncer. This redesign proves that <b>AAA accessibility</b> and <b>high-security standards</b> aren't in competition; they are two sides of the same coin. By unifying the architecture of Turnstile and Challenge Pages, we’ve built a foundation that allows us to iterate faster and protect the Internet more humanely than ever before.</p>
    <div>
      <h2>Looking ahead</h2>
      <a href="#looking-ahead">
        
      </a>
    </div>
    <p>This redesign is a foundation, not a finish line.</p><p>We're continuing to monitor how users interact with the new experience, and we're committed to iterating based on what we learn. The feedback mechanisms we've built into the new design — the ones that actually help users troubleshoot, rather than just asking them to report problems — will give us richer insights than we've ever had before.</p><p>We're also watching how the security landscape evolves. As bot attacks grow more sophisticated, and as AI continues to blur the line between human and automated behavior, the challenge of verification will only get harder. Our job is to stay ahead — to keep improving security without making the human experience worse.</p><p>If you encounter the new Turnstile or Challenge Pages and have feedback, we want to hear it. Reach out through our <a href="https://community.cloudflare.com/"><u>community forums</u></a> or use the feedback mechanisms built into the experience itself.</p> ]]></content:encoded>
            <category><![CDATA[Security Week]]></category>
            <category><![CDATA[Turnstile]]></category>
            <category><![CDATA[Challenge Page]]></category>
            <category><![CDATA[Design]]></category>
            <category><![CDATA[Product Design]]></category>
            <category><![CDATA[User Research]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[Bot Management]]></category>
            <category><![CDATA[WAF]]></category>
            <category><![CDATA[Engineering]]></category>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[Accessibility]]></category>
            <guid isPermaLink="false">19fiiQAG0XsaS9p0daOBus</guid>
            <dc:creator>Leo Bacevicius</dc:creator>
            <dc:creator>Ana Foppa</dc:creator>
            <dc:creator>Marina Elmore</dc:creator>
        </item>
        <item>
            <title><![CDATA[Beyond IP lists: a registry format for bots and agents]]></title>
            <link>https://blog.cloudflare.com/agent-registry/</link>
            <pubDate>Thu, 30 Oct 2025 22:00:00 GMT</pubDate>
            <description><![CDATA[ We propose an open registry format for Web Bot Auth to move beyond IP-based identity. This allows any origin to discover and verify cryptographic keys for bots, fostering a decentralized and more trustworthy ecosystem. ]]></description>
            <content:encoded><![CDATA[ <p></p><p>As bots and agents start <a href="https://blog.cloudflare.com/web-bot-auth/"><u>cryptographically signing their requests</u></a>, there is a growing need for website operators to learn public keys as they are setting up their service. I might be able to find the public key material for well-known fetchers and crawlers, but what about the next 1,000 or next 1,000,000? And how do I find their public key material in order to verify that they are who they say they are? This problem is called <i>discovery.</i></p><p>We share this problem with <a href="https://aws.amazon.com/bedrock/agentcore/"><u>Amazon Bedrock AgentCore</u></a>, a comprehensive agentic platform to build, deploy and operate highly capable agents at scale, and their <a href="https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/browser-tool.html"><u>AgentCore Browser</u></a>, a fast, secure, cloud-based browser runtime to enable AI agents to interact with websites at scale. The AgentCore team wants to make it easy for each of their customers to sign <i>their own requests</i>, so that Cloudflare and other operators of CDN infrastructure see agent signatures from individual agents rather than AgentCore as a monolith. (Note: this method does not identify individual users.) In order to do this, Cloudflare needed a way to ingest and register the public keys of AgentCore’s customers at scale. </p><p>In this blog post, we propose a registry of bots and agents as a way to easily discover them on the Internet. We also outline how <a href="https://blog.cloudflare.com/web-bot-auth/"><u>Web Bot Auth</u></a> can be expanded with a registry format. Similar to IP lists that can be authored by anyone and easily imported, the <a href="https://datatracker.ietf.org/doc/draft-meunier-webbotauth-registry/"><u>registry format</u></a> is a list of URLs at which to retrieve agent keys and can be authored and imported easily.</p><p>We believe such registries should foster and strengthen an open ecosystem of curators that website operators can trust.</p>
    <div>
      <h2>A need for more trustworthy authentication</h2>
      <a href="#a-need-for-more-trustworthy-authentication">
        
      </a>
    </div>
    <p>In May, we introduced a protocol proposal called <a href="https://blog.cloudflare.com/web-bot-auth/"><u>Web Bot Auth</u></a>, which describes how bot and agent developers can cryptographically sign requests coming from their infrastructure. </p><p>There have now been multiple implementations of the proposed protocol, from <a href="https://vercel.com/changelog/vercels-bot-verification-now-supports-web-bot-auth"><u>Vercel</u></a> to <a href="https://changelog.shopify.com/posts/authorize-custom-crawlers-and-tools-with-new-crawler-access-keys"><u>Shopify</u></a> to <a href="https://www.cloudflare.com/press/press-releases/2025/cloudflare-collaborates-with-leading-payments-companies-to-secure-and-enable-agentic-commerce/"><u>Visa</u></a>. It has been actively <a href="https://mailarchive.ietf.org/arch/browse/web-bot-auth/"><u>discussed</u></a> and contributions have been made. Web Bot Auth marks a first step towards moving from brittle identification, like IPs and user agents, to more trustworthy cryptographic authentication. However, like IP addresses, cryptographic keys are a pseudonymous form of identity. If you operate a website without the scale and reach of large CDNs, how do you discover the public key of known crawlers?</p><p>The first protocol proposal suggested one approach: bot operators would provide a newly-defined HTTP header Signature-Agent that refers to an HTTP endpoint hosting their keys. Similar to IP addresses, the default is to allow all, but if a particular operator is making too many requests, you can start taking actions: increase their rate limit, contact the operator, etc.</p><p>Here’s an example from <a href="https://help.shopify.com/en/manual/promoting-marketing/seo/crawling-your-store"><u>Shopify's online store</u></a>:</p>
            <pre><code>Signature-Agent: "https://shopify.com"</code></pre>
            
    <div>
      <h2>A registry format</h2>
      <a href="#a-registry-format">
        
      </a>
    </div>
    <p>With all that in mind, we come to the following problem. How can Cloudflare ensure customers have control over the traffic they want to allow, with sensible defaults, while fostering an open curation ecosystem that doesn’t lock in customers or small origins?</p><p>Such an ecosystem exists for lists of IP addresses (e.g.<a href="https://github.com/antoinevastel/avastel-bot-ips-lists/blob/master/avastel-proxy-bot-ips-1day.txt"><u> avestel-bots-ip-lists</u></a>) and robots.txt (e.g.<a href="https://github.com/ai-robots-txt/ai.robots.txt"><u> ai-robots-txt</u></a>). For both, you can find canonical lists on the Internet to easily configure your website to allow or disallow traffic from those IPs. They provide direct configuration for your nginx or haproxy, and you can use it to configure your Cloudflare account. For instance, I could import the robots.txt below:</p>
            <pre><code>User-agent: MyBadBot
Disallow: /</code></pre>
            <p>This is where the registry format comes in, providing a list of URLs pointing to Signature Agent keys:</p>
            <pre><code># AI Crawler
https://chatgpt.com/.well-known/http-message-signatures-directory
https://autorag.ai.cloudflare.com/.well-known/http-message-signatures-directory
 
# Test signature agent card
https://http-message-signatures-example.research.cloudflare.com/.well-known/http-message-signatures-directory</code></pre>
            <p>And that's it. A registry could contain a list of all known signature agents, a curated list for academic research agents, for search agents, etc.</p><p>Anyone can maintain and host these lists. Similar to IP or robots.txt list, you can host such a registry on any public file system. This means you can have a repository on GitHub, put the file on Cloudflare R2, or send it as an email attachment. Cloudflare intends to provide one of the first instances of this registry, so that others can contribute to it or reference it when building their own. </p>
    <div>
      <h2>Learn more about an incoming request</h2>
      <a href="#learn-more-about-an-incoming-request">
        
      </a>
    </div>
    <p>Knowing the Signature-Agent is great, but not sufficient. For instance, to be a verified bot, Cloudflare requires a contact method, in case requests from that infrastructure suddenly fail or change format in a way that causes unexpected errors upstream. In fact, there is a lot of information an origin might want to know: a name for the operator, a contact method, a logo, the expected crawl rate, etc.</p><p>Therefore, to complement the registry format, we have proposed a <a href="https://thibmeu.github.io/http-message-signatures-directory/draft-meunier-webbotauth-registry.html#name-signature-agent-card"><u>signature-agent card format</u> that </a>extends the JWKS directory (<a href="https://www.rfc-editor.org/rfc/rfc7517"><u>RFC 7517</u></a>) with additional metadata. Similar to an old-fashioned contact card, it includes all the important information someone might want to know about your agent or crawler. </p><p>We provide an example below for illustration. Note that the fields may change: introducing jwks-uri, logo being more descriptive, etc.</p>
            <pre><code>{
  "client_name": "Example Bot",
  "client_uri": "https://example.com/bot/about.html",
  "logo_uri": "https://example.com/",
  "contacts": ["mailto:bot-support@example.com"],
  "expected-user-agent": "Mozilla/5.0 ExampleBot",
  "rfc9309-product-token": "ExampleBot",
  "rfc9309-compliance": ["User-Agent", "Allow", "Disallow", "Content-Usage"],
  "trigger": "fetcher",
  "purpose": "tdm",
  "targeted-content": "Cat pictures",
  "rate-control": "429",
  "rate-expectation": "avg=10rps;max=100rps",
  "known-urls": ["/", "/robots.txt", "*.png"],
  "keys": [{
    "kty": "OKP",
    "crv": "Ed25519",
    "kid": "NFcWBst6DXG-N35nHdzMrioWntdzNZghQSkjHNMMSjw",
    "x": "JrQLj5P_89iXES9-vFgrIy29clF9CC_oPPsw3c5D0bs",
    "use": "sig",
    "nbf": 1712793600,
    "exp": 1715385600
  }]
}</code></pre>
            
    <div>
      <h2>Operating a registry</h2>
      <a href="#operating-a-registry">
        
      </a>
    </div>
    <p>Amazon Bedrock AgentCore, an agentic platform for building and deploying AI agents at scale, adopted Web Bot Auth for its AgentCore Browser service (learn more in <a href="https://aws.amazon.com/blogs/machine-learning/reduce-captchas-for-ai-agents-browsing-the-web-with-web-bot-auth-preview-in-amazon-bedrock-agentcore-browser/">their post)</a>. AgentCore Browser intends to transition from a service signing key that is currently available in their public preview, to customer-specific keys, once the protocol matures. Cloudflare and other operators of origin protection service will be able to see and validate signatures from individual AgentCore customers rather than AgentCore as a whole.</p><p>Cloudflare also offers a registry for bots and agents it trusts, provided through Radar. It uses the <a href="https://assets.radar.cloudflare.com/bots/signature-agent-registry.txt"><u>registry format</u></a> to allow for the consumption of bots trusted by Cloudflare on your server.</p><p>You can use these registries today – we’ve provided a demo in Go for <a href="https://caddyserver.com/"><u>Caddy server</u></a> that would allow us to import keys from multiple registries. It’s on <a href="https://github.com/cloudflare/web-bot-auth/pull/52"><u>cloudflare/web-bot-auth</u></a>. The configuration looks like this:</p>
            <pre><code>:8080 {
    route {
        # httpsig middleware is used here
        httpsig {
            registry "http://localhost:8787/test-registry.txt"
            # You can specify multiple registries. All tags will be checked independantly
            registry "http://example.test/another-registry.txt"
        }

        # Responds if signature is valid
        handle {
            respond "Signature verification succeeded!" 200
        }
    }
}</code></pre>
            <p>There are several reasons why you might want to operate and curate a registry leveraging the <a href="https://www.ietf.org/archive/id/draft-meunier-webbotauth-registry-01.html#name-signature-agent-card"><u>Signature Agent Card format</u></a>:</p><ol><li><p><b>Monitor incoming </b><code><b>Signature-Agent</b></code><b>s.</b> This should allow you to collect signature-agent cards of agents reaching out to your domain.</p></li><li><p><b>Import them from existing registries, and categorize them yourself.</b> There could be a general registry constructed from the monitoring step above, but registries might be more useful with more categories.</p></li><li><p><b>Establish direct relationships with agents.</b> Cloudflare does this for its<a href="https://radar.cloudflare.com/bots#verified-bots"> <u>bot registry</u></a> for instance, or you might use a public GitHub repository where people can open issues.</p></li><li><p><b>Learn from your users.</b> If you offer a security service, allowing your customers to specify the registries/signature-agents they want to let through allows you to gain valuable insight.</p></li></ol>
    <div>
      <h2>Moving forward</h2>
      <a href="#moving-forward">
        
      </a>
    </div>
    <p>As cryptographic authentication for bots and agents grows, the need for discovery increases.</p><p>With the introduction of a lightweight format and specification to attach metadata to Signature-Agent, and curate them in the form of registries, we begin to address this need. The HTTP Message Signature directory format is being expanded to include some self-certified metadata, and the registry maintains a curation ecosystem.</p><p>Down the line, we predict that clients and origins will choose the signature-agent they trust, use a common format to migrate their configuration between CDN providers, and rely on a third-party registry for curation. We are working towards integrating these capabilities into our bot management and rule engines.</p><p>If you’d like to experiment, our demo is on <a href="https://github.com/cloudflare/web-bot-auth/pull/52"><u>GitHub</u></a>. If you’d like to help us, <a href="https://blog.cloudflare.com/cloudflare-1111-intern-program/"><u>we’re hiring 1,111 interns</u></a> over the course of next year, and have <a href="https://www.cloudflare.com/careers/"><u>open positions</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[Research]]></category>
            <category><![CDATA[Bots]]></category>
            <guid isPermaLink="false">3VeTsp2f9v3B1QZF0oglUV</guid>
            <dc:creator>Thibault Meunier</dc:creator>
            <dc:creator>Maxime Guerreiro</dc:creator>
        </item>
        <item>
            <title><![CDATA[One IP address, many users: detecting CGNAT to reduce collateral effects]]></title>
            <link>https://blog.cloudflare.com/detecting-cgn-to-reduce-collateral-damage/</link>
            <pubDate>Wed, 29 Oct 2025 13:00:00 GMT</pubDate>
            <description><![CDATA[ IPv4 scarcity drives widespread use of Carrier-Grade Network Address Translation, a practice in ISPs and mobile networks that places many users behind each IP address, along with their collected activity and volumes of traffic. We introduce the method we’ve developed to detect large-scale IP sharing globally and mitigate the issues that result.  ]]></description>
            <content:encoded><![CDATA[ <p>IP addresses have historically been treated as stable identifiers for non-routing purposes such as for geolocation and security operations. Many operational and security mechanisms, such as blocklists, rate-limiting, and anomaly detection, rely on the assumption that a single IP address represents a cohesive<b>, </b>accountable<b> </b>entity or even, possibly, a specific user or device.</p><p>But the structure of the Internet has changed, and those assumptions can no longer be made. Today, a single IPv4 address may represent hundreds or even thousands of users due to widespread use of <a href="https://en.wikipedia.org/wiki/Carrier-grade_NAT"><u>Carrier-Grade Network Address Translation (CGNAT)</u></a>, VPNs, and proxy<b> </b>middleboxes. This concentration of traffic can result in <a href="https://blog.cloudflare.com/consequences-of-ip-blocking/"><u>significant collateral damage</u></a> – especially to users in developing regions of the world – when security mechanisms are applied without taking into account the multi-user nature of IPs.</p><p>This blog post presents our approach to detecting large-scale IP sharing globally. We describe how we <a href="https://www.cloudflare.com/learning/ai/how-to-secure-training-data-against-ai-data-leaks/">build reliable training data</a>, and how detection can help avoid unintentional bias affecting users in regions where IP sharing is most prevalent. Arguably it's those regional variations that motivate our efforts more than any other. </p>
    <div>
      <h2>Why this matters: Potential socioeconomic bias</h2>
      <a href="#why-this-matters-potential-socioeconomic-bias">
        
      </a>
    </div>
    <p>Our work was initially motivated by a simple observation: CGNAT is a likely unseen source of bias on the Internet. Those biases would be more pronounced wherever there are more users and few addresses, such as in developing regions. And these biases can have profound implications for user experience, network operations, and digital equity.</p><p>The reasons are understandable for many reasons, not least because of necessity. Countries in the developing world often have significantly fewer available IPs, and more users. The disparity is a historical artifact of how the Internet grew: the largest blocks of IPv4 addresses were allocated decades ago, primarily to organizations in North America and Europe, leaving a much smaller pool for regions where Internet adoption expanded later. </p><p>To visualize the IPv4 allocation gap, we plot country-level ratios of users to IP addresses in the figure below. We take online user estimates from the <a href="https://data.worldbank.org/indicator/IT.NET.USER.ZS"><u>World Bank Group</u></a> and the number of IP addresses in a country from Regional Internet Registry (RIR) records. The colour-coded map that emerges shows that the usage of each IP address is more concentrated in regions that generally have poor Internet penetration. For example, large portions of Africa and South Asia appear with the highest user-to-IP ratios. Conversely, the lowest user-to-IP ratios appear in Australia, Canada, Europe, and the USA — the very countries that otherwise have the highest Internet user penetration numbers.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2YBdqPx0ALt7pY7rmQZyLQ/049922bae657a715728700c764c4af16/BLOG-3046_2.png" />
          </figure><p>The scarcity of IPv4 address space means that regional differences can only worsen as Internet penetration rates increase. A natural consequence of increased demand in developing regions is that ISPs would rely even more heavily on CGNAT, and is compounded by the fact that CGNAT is common in mobile networks that users in developing regions so heavily depend on. All of this means that <a href="https://datatracker.ietf.org/doc/html/rfc7021"><u>actions known to be based</u></a> on IP reputation or behaviour would disproportionately affect developing economies. </p><p>Cloudflare is a global network in a global Internet. We are sharing our methodology so that others might benefit from our experience and help to mitigate unintended effects. First, let’s better understand CGNAT.</p>
    <div>
      <h3>When one IP address serves multiple users</h3>
      <a href="#when-one-ip-address-serves-multiple-users">
        
      </a>
    </div>
    <p>Large-scale IP address sharing is primarily achieved through two distinct methods. The first, and more familiar, involves services like VPNs and proxies. These tools emerge from a need to secure corporate networks or improve users' privacy, but can be used to circumvent censorship or even improve performance. Their deployment also tends to concentrate traffic from many users onto a small set of exit IPs. Typically, individuals are aware they are using such a service, whether for personal use or as part of a corporate network.</p><p>Separately, another form of large-scale IP sharing often goes unnoticed by users: <a href="https://en.wikipedia.org/wiki/Carrier-grade_NAT"><u>Carrier-Grade NAT (CGNAT)</u></a>. One way to explain CGNAT is to start with a much smaller version of network address translation (NAT) that very likely exists in your home broadband router, formally called a Customer Premises Equipment (or CPE), which translates unseen private addresses in the home to visible and routable addresses in the ISP. Once traffic leaves the home, an ISP may add an additional enterprise-level address translation that causes many households or unrelated devices to appear behind a single IP address.</p><p>The crucial difference between large-scale IP sharing is user choice: carrier-grade address sharing is not a user choice, but is configured directly by Internet Service Providers (ISPs) within their access networks. Users are not aware that CGNATs are in use. </p><p>The primary driver for this technology, understandably, is the exhaustion of the IPv4 address space. IPv4's 32-bit architecture supports only 4.3 billion unique addresses — a capacity that, while once seemingly vast, has been completely outpaced by the Internet's explosive growth. By the early 2010s, Regional Internet Registries (RIRs) had depleted their pools of unallocated IPv4 addresses. This left ISPs unable to easily acquire new address blocks, forcing them to maximize the use of their existing allocations.</p><p>While the long-term solution is the transition to IPv6, CGNAT emerged as the immediate, practical workaround. Instead of assigning a unique public IP address to each customer, ISPs use CGNAT to place multiple subscribers behind a single, shared IP address. This practice solves the problem of IP address scarcity. Since translated addresses are not publicly routable, CGNATs have also had the positive side effect of protecting many home devices that might be vulnerable to compromise. </p><p>CGNATs also create significant operational fallout stemming from the fact that hundreds or even thousands of clients can appear to originate from a single IP address. <b>This means an IP-based security system may inadvertently block or throttle large groups of users as a result of a single user behind the CGNAT engaging in malicious activity.</b></p><p>This isn't a new or niche issue. It has been recognized for years by the Internet Engineering Task Force (IETF), the organization that develops the core technical standards for the Internet. These standards, known as Requests for Comments (RFCs), act as the official blueprints for how the Internet should operate. <a href="https://www.rfc-editor.org/rfc/rfc6269.html"><u>RFC 6269</u></a>, for example, discusses the challenges of IP address sharing, while <a href="https://datatracker.ietf.org/doc/html/rfc7021"><u>RFC 7021</u></a> examines the impact of CGNAT on network applications. Both explain that traditional abuse-mitigation techniques, such as blocklisting or rate-limiting, assume a one-to-one relationship between IP addresses and users: when malicious activity is detected, the offending IP address can be blocked to prevent further abuse.</p><p>In shared IPv4 environments, such as those using CGNAT or other address-sharing techniques, this assumption breaks down because multiple subscribers can appear under the same public IP. Blocking the shared IP therefore penalizes many innocent users along with the abuser. In 2015 Ofcom, the UK's telecommunications regulator, reiterated these concerns in a <a href="https://oxil.uk/research/mc159-report-on-the-implications-of-carrier-grade-network-address-translators-final-report"><u>report</u></a> on the implications of CGNAT where they noted that, “In the event that an IPv4 address is blocked or blacklisted as a source of spam, the impact on a CGNAT would be greater, potentially affecting an entire subscriber base.” </p><p>While the hope was that CGNAT was only a temporary solution until the eventual switch to IPv6, as the old proverb says, nothing is more permanent than a temporary solution. While IPv6 deployment continues to lag, <a href="https://blog.apnic.net/2022/01/19/ip-addressing-in-2021/"><u>CGNAT deployments have become increasingly common</u></a>, and so do the related problems. </p>
    <div>
      <h2>CGNAT detection at Cloudflare</h2>
      <a href="#cgnat-detection-at-cloudflare">
        
      </a>
    </div>
    <p>To enable a fairer treatment of users behind CGNAT IPs by security techniques that rely on IP reputation, our goal is to identify large-scale IP sharing. This allows traffic filtering to be better calibrated and collateral damage minimized. Additionally, we want to distinguish CGNAT IPs from other large-scale sharing (LSS) IP technologies, such as VPNs and proxies, because we may need to take different approaches to different kinds of IP-sharing technologies.</p><p>To do this, we decided to take advantage of Cloudflare’s extensive view of the active IP clients, and build a supervised learning classifier that would distinguish CGNAT and VPN/proxy IPs from IPs that are allocated to a single subscriber (non-LSS IPs), based on behavioural characteristics. The figure below shows an overview of our supervised classifier: </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7tFXZByKRCYxVaAFDG0Xda/d81e7f09b5d12e03e39c266696df9cc3/BLOG-3046_3.png" />
          </figure><p>While our classification approach is straightforward, a significant challenge is the lack of a reliable, comprehensive, and labeled dataset of CGNAT IPs for our training dataset.</p>
    <div>
      <h3>Detecting CGNAT using public data sources </h3>
      <a href="#detecting-cgnat-using-public-data-sources">
        
      </a>
    </div>
    <p>Detection begins by building an initial dataset of IPs believed to be associated with CGNAT. Cloudflare has vast HTTP and traffic logs. Unfortunately there is no signal or label in any request to indicate what is or is not a CGNAT. </p><p>To build an extensive labelled dataset to train our ML classifier, we employ a combination of network measurement techniques, as described below. We rely on public data sources to help disambiguate an initial set of large-scale shared IP addresses from others in Cloudflare’s logs.   </p>
    <div>
      <h4>Distributed Traceroutes</h4>
      <a href="#distributed-traceroutes">
        
      </a>
    </div>
    <p>The presence of a client behind CGNAT can often be inferred through traceroute analysis. CGNAT requires ISPs to insert a NAT step that typically uses the Shared Address Space (<a href="https://datatracker.ietf.org/doc/html/rfc6598"><u>RFC 6598</u></a>) after the customer premises equipment (CPE). By running a traceroute from the client to its own public IP and examining the hop sequence, the appearance of an address within 100.64.0.0/10 between the first private hop (e.g., 192.168.1.1) and the public IP is a strong indicator of CGNAT.</p><p>Traceroute can also reveal multi-level NAT, which CGNAT requires, as shown in the diagram below. If the ISP assigns the CPE a private <a href="https://datatracker.ietf.org/doc/html/rfc1918"><u>RFC 1918</u></a> address that appears right after the local hop, this indicates at least two NAT layers. While ISPs sometimes use private addresses internally without CGNAT, observing private or shared ranges immediately downstream combined with multiple hops before the public IP strongly suggests CGNAT or equivalent multi-layer NAT.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/57k4gwGCHcPggIWtSy36HU/6cf8173c1a4c568caa25a1344a516e9e/BLOG-3046_4.png" />
          </figure><p>Although traceroute accuracy depends on router configurations, detecting private and shared IP ranges is a reliable way to identify large-scale IP sharing. We apply this method to distributed traceroutes from over 9,000 RIPE Atlas probes to classify hosts as behind CGNAT, single-layer NAT, or no NAT.</p>
    <div>
      <h4>Scraping WHOIS and PTR records</h4>
      <a href="#scraping-whois-and-ptr-records">
        
      </a>
    </div>
    <p>Many operators encode metadata about their IPs in the corresponding reverse DNS pointer (PTR) record that can signal administrative attributes and geographic information. We first query the DNS for PTR records for the full IPv4 space and then filter for a set of known keywords from the responses that indicate a CGNAT deployment. For example, each of the following three records matches a keyword (<code>cgnat</code>, <code>cgn</code> or <code>lsn</code>) used to detect CGNAT address space:</p><p><code>node-lsn.pool-1-0.dynamic.totinternet.net
103-246-52-9.gw1-cgnat.mobile.ufone.nz
cgn.gsw2.as64098.net</code></p><p>WHOIS and Internet Routing Registry (IRR) records may also contain organizational names, remarks, or allocation details that reveal whether a block is used for CGNAT pools or residential assignments. </p><p>Given that both PTR and WHOIS records may be manually maintained and therefore may be stale, we try to sanitize the extracted data by validating the fact that the corresponding ISPs indeed use CGNAT based on customer and market reports. </p>
    <div>
      <h4>Collecting VPN and proxy IPs </h4>
      <a href="#collecting-vpn-and-proxy-ips">
        
      </a>
    </div>
    <p>Compiling a list of VPN and proxy IPs is more straightforward, as we can directly find such IPs in public service directories for anonymizers. We also subscribe to multiple VPN providers, and we collect the IPs allocated to our clients by connecting to a unique HTTP endpoint under our control. </p>
    <div>
      <h2>Modeling CGNAT with machine learning</h2>
      <a href="#modeling-cgnat-with-machine-learning">
        
      </a>
    </div>
    <p>By combining the above techniques, we accumulated a dataset of labeled IPs for more than 200K CGNAT IPs, 180K VPNs &amp; proxies and close to 900K IPs allocated that are not LSS IPs. These were the entry points to modeling with machine learning.</p>
    <div>
      <h3>Feature selection</h3>
      <a href="#feature-selection">
        
      </a>
    </div>
    <p>Our hypothesis was that aggregated activity from CGNAT IPs is distinguishable from activity generated from other non-CGNAT IP addresses. Our feature extraction is an evaluation of that hypothesis — since networks do not disclose CGNAT and other uses of IPs, the quality of our inference is strictly dependent on our confidence in the training data. We claim the key discriminator is diversity, not just volume. For example, VM-hosted scanners may generate high numbers of requests, but with low information diversity. Similarly, globally routable CPEs may have individually unique characteristics, but with volumes that are less likely to be caught at lower sampling rates.</p><p>In our feature extraction, we parse a 1% sampled HTTP requests log for distinguishing features of IPs compiled in our reference set, and the same features for the corresponding /24 prefix (namely IPs with the same first 24 bits in common). We analyse the features for each of the VPNs, proxies, CGNAT, or non LSS IP. We find that features from the following broad categories are key discriminators for the different types of IPs in our training dataset:</p><ul><li><p><b>Client-side signals:</b> We analyze the aggregate properties of clients connecting from an IP. A large, diverse user base (like on a CGNAT) naturally presents a much wider statistical variety of client behaviors and connection parameters than a single-tenant server or a small business proxy.</p></li><li><p><b>Network and transport-level behaviors:</b> We examine traffic at the network and transport layers. The way a large-scale network appliance (like a CGNAT) manages and routes connections often leaves subtle, measurable artifacts in its traffic patterns, such as in port allocation and observed network timing.</p></li><li><p><b>Traffic volume and destination diversity:</b> We also model the volume and "shape" of the traffic. An IP representing thousands of independent users will, on average, generate a higher volume of requests and target a much wider, less correlated set of destinations than an IP representing a single user.</p></li></ul><p>Crucially, to distinguish CGNAT from VPNs and proxies (which is absolutely necessary for calibrated security filtering), we had to aggregate these features at two different scopes: per-IP and per /24 prefixes. CGNAT IPs are typically allocated large blocks of IPs, whereas VPNs IPs are more scattered across different IP prefixes. </p>
    <div>
      <h3>Classification results</h3>
      <a href="#classification-results">
        
      </a>
    </div>
    <p>We compute the above features from HTTP logs over 24-hour intervals to increase data volume and reduce noise due to DHCP IP reallocation. The dataset is split into 70% training and 30% testing sets with disjoint /24 prefixes, and VPN and proxy labels are merged due to their similarity and lower operational importance compared to CGNAT detection.</p><p>Then we train a multi-class <a href="https://xgboost.readthedocs.io/en/stable/"><u>XGBoost</u></a> model with class weighting to address imbalance, assigning each IP to the class with the highest predicted probability. XGBoost is well-suited for this task because it efficiently handles large feature sets, offers strong regularization to prevent overfitting, and delivers high accuracy with limited parameter tuning. The classifier achieves 0.98 accuracy, 0.97 weighted F1, and 0.04 log loss. The figure below shows the confusion matrix of the classification.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/26i81Pe0yjlftHfIDrjB5X/45d001447fc52001a25176c8036a92cb/BLOG-3046_5.png" />
          </figure><p>Our model is accurate for all three labels. The errors observed are mainly misclassifications of VPN/proxy IPs as CGNATs, mostly for VPN/proxy IPs that are within a /24 prefix that is also shared by broadband users outside of the proxy service. We also evaluate the prediction accuracy using <a href="https://scikit-learn.org/stable/modules/cross_validation.html"><u>k-fold cross validation</u></a>, which provides a more reliable estimate of performance by training and validating on multiple data splits, reducing variance and overfitting compared to a single train–test split. We select 10 folds and we evaluate the <a href="https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc"><u>Area Under the ROC Curve</u></a> (AUC) and the multi-class logloss. We achieve a macro-average AUC of 0.9946 (σ=0.0069) and log loss of 0.0429 (σ=0.0115). Prefix-level features are the most important contributors to classification performance.</p>
    <div>
      <h3>Users behind CGNAT are more likely to be rate limited</h3>
      <a href="#users-behind-cgnat-are-more-likely-to-be-rate-limited">
        
      </a>
    </div>
    <p>The figure below shows the daily number of CGNAT IP inferences generated by our CDN-deployed detection service between December 17, 2024 and January 9, 2025. The number of inferences remains largely stable, with noticeable dips during weekends and holidays such as Christmas and New Year’s Day. This pattern reflects expected seasonal variations, as lower traffic volumes during these periods lead to fewer active IP ranges and reduced request activity.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7hiYstptHAK6tFQrM2kEsf/7f8192051156fc6eaecdf26a829ef11c/BLOG-3046_6.png" />
          </figure><p>Next, recall that actions that rely on IP reputation or behaviour may be unduly influenced by CGNATs. One such example is bot detection. In an evaluation of our systems, we find that bot detection is resilient to those biases. However, we also learned that customers are more likely to rate limit IPs that we find are CGNATs.</p><p>We analyze bot labels by analyzing how often requests from CGNAT and non-CGNAT IPs are labeled as bots. <a href="https://www.cloudflare.com/resources/assets/slt3lc6tev37/JYknFdAeCVBBWWgQUtNZr/61844a850c5bba6b647d65e962c31c9c/BDES-863_Bot_Management_re_edit-_How_it_Works_r3.pdf"><u>Cloudflare assigns a bot score</u></a> to each HTTP request using CatBoost models trained on various request features, and these scores are then exposed through the Web Application Firewall (WAF), allowing customers to apply filtering rules. The median bot rate is nearly identical for CGNAT (4.8%) and non-CGNAT (4.7%) IPs. However, the mean bot rate is notably lower for CGNATs (7%) than for non-CGNATs (13.1%), indicating different underlying distributions. Non-CGNAT IPs show a much wider spread, with some reaching 100% bot rates, while CGNAT IPs cluster mostly below 15%. This suggests that non-CGNAT IPs tend to be dominated by either human or bot activity, whereas CGNAT IPs reflect mixed behavior from many end users, with human traffic prevailing.</p><p>Interestingly, despite bot scores that indicate traffic is more likely to be from human users, CGNAT IPs are subject to rate limiting three times more often than non-CGNAT IPs. This is likely because multiple users share the same public IP, increasing the chances that legitimate traffic gets caught by customers’ bot mitigation and firewall rules.</p><p>This tells us that users behind CGNAT IPs are indeed susceptible to collateral effects, and identifying those IPs allows us to tune mitigation strategies to disrupt malicious traffic quickly while reducing collateral impact on benign users behind the same address.</p>
    <div>
      <h2>A global view of the CGNAT ecosystem</h2>
      <a href="#a-global-view-of-the-cgnat-ecosystem">
        
      </a>
    </div>
    <p>One of the early motivations of this work was to understand if our knowledge about IP addresses might hide a bias along socio-economic boundaries—and in particular if an action on an IP address may disproportionately affect populations in developing nations, often referred to as the Global South. Identifying where different IPs exist is a necessary first step.</p><p>The map below shows the fraction of a country’s inferred CGNAT IPs over all IPs observed in the country. Regions with a greater reliance on CGNAT appear darker on the map. This view highlights the geodiversity of CGNATs in terms of importance; for example, much of Africa and Central and Southeast Asia rely on CGNATs. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4P2XcuEebKfcYdCgykMWuP/4a0aa86bd619ba24533de6862175e919/BLOG-3046_7.png" />
          </figure><p>As further evidence of continental differences, the boxplot below shows the distribution of distinct user agents per IP across /24 prefixes inferred to be part of a CGNAT deployment in each continent. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7bqJSHexFuXFs4A8am1ibQ/591be6880e8f58c9d61b147aaf0487f5/BLOG-3046_8.png" />
          </figure><p>Notably, Africa has a much higher ratio of user agents to IP addresses than other regions, suggesting more clients share the same IP in African <a href="https://www.cloudflare.com/learning/network-layer/what-is-an-autonomous-system/"><u>ASNs</u></a>. So, not only do African ISPs rely more extensively on CGNAT, but the number of clients behind each CGNAT IP is higher. </p><p>While the deployment rate of CGNAT per country is consistent with the users-per-IP ratio per country, it is not sufficient by itself to confirm deployment. The scatterplot below shows the number of users (according to <a href="https://stats.labs.apnic.net/aspop/"><u>APNIC user estimates</u></a>) and the number of IPs per ASN for ASNs where we detect CGNAT. ASNs that have fewer available IP addresses than their user base appear below the diagonal. Interestingly the scatterplot indicates that many ASNs with more addresses than users still choose to deploy CGNAT. Presumably, these ASNs provide additional services beyond broadband, preventing them from dedicating their entire address pool to subscribers. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/34GKPlJWvkwudU5MbOtots/c883760a7c448b12995997e3e6e51979/BLOG-3046_9.png" />
          </figure>
    <div>
      <h3>What this means for everyday Internet users</h3>
      <a href="#what-this-means-for-everyday-internet-users">
        
      </a>
    </div>
    <p>Accurate detection of CGNAT IPs is crucial for minimizing collateral effects in network operations and for ensuring fair and effective application of security measures. Our findings underscore the potential socio-economic and geographical variations in the use of CGNATs, revealing significant disparities in how IP addresses are shared across different regions. </p><p>At Cloudflare we are going beyond just using these insights to evaluate policies and practices. We are using the detection systems to improve our systems across our application security suite of features, and working with customers to understand how they might use these insights to improve the protections they configure.</p><p>Our work is ongoing and we’ll share details as we go. In the meantime, if you’re an ISP or network operator that operates CGNAT and want to help, get in touch at <a><u>ask-research@cloudflare.com</u></a>. Sharing knowledge and working together helps make better and equitable user experience for subscribers, while preserving web service safety and security.</p> ]]></content:encoded>
            <category><![CDATA[Research]]></category>
            <category><![CDATA[WAF]]></category>
            <category><![CDATA[Web Application Firewall]]></category>
            <category><![CDATA[Better Internet]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[IPv4]]></category>
            <category><![CDATA[Network Services]]></category>
            <guid isPermaLink="false">9cTCNUkDdgVjdBN6M6JLv</guid>
            <dc:creator>Vasilis Giotsas</dc:creator>
            <dc:creator>Marwan Fayed</dc:creator>
        </item>
        <item>
            <title><![CDATA[15 years of helping build a better Internet: a look back at Birthday Week 2025]]></title>
            <link>https://blog.cloudflare.com/birthday-week-2025-wrap-up/</link>
            <pubDate>Mon, 29 Sep 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Rust-powered core systems, post-quantum upgrades, developer access for students, PlanetScale integration, open-source partnerships, and our biggest internship program ever — 1,111 interns in 2026. ]]></description>
            <content:encoded><![CDATA[ <p>Cloudflare launched fifteen years ago with a mission to help build a better Internet. Over that time the Internet has changed and so has what it needs from teams like ours.  In this year’s <a href="https://blog.cloudflare.com/cloudflare-2025-annual-founders-letter/"><u>Founder’s Letter</u></a>, Matthew and Michelle discussed the role we have played in the evolution of the Internet, from helping encryption grow from 10% to 95% of Internet traffic to more recent challenges like how people consume content. </p><p>We spend Birthday Week every year releasing the products and capabilities we believe the Internet needs at this moment and around the corner. Previous <a href="https://blog.cloudflare.com/tag/birthday-week/"><u>Birthday Weeks</u></a> saw the launch of <a href="https://blog.cloudflare.com/introducing-cloudflares-automatic-ipv6-gatewa/"><u>IPv6 gateway</u></a> in 2011,  <a href="https://blog.cloudflare.com/introducing-universal-ssl/"><u>Universal SSL</u></a> in 2014, <a href="https://blog.cloudflare.com/introducing-cloudflare-workers/"><u>Cloudflare Workers</u></a> and <a href="https://blog.cloudflare.com/unmetered-mitigation/"><u>unmetered DDoS protection</u></a> in 2017, <a href="https://blog.cloudflare.com/introducing-cloudflare-radar/"><u>Cloudflare Radar</u></a> in 2020, <a href="https://www.cloudflare.com/developer-platform/products/r2/"><u>R2 Object Storage</u></a> with zero egress fees in 2021,  <a href="https://blog.cloudflare.com/post-quantum-tunnel/"><u>post-quantum upgrades for Cloudflare Tunnel</u></a> in 2022, <a href="https://blog.cloudflare.com/best-place-region-earth-inference/"><u>Workers AI</u></a> and <a href="https://blog.cloudflare.com/announcing-encrypted-client-hello/"><u>Encrypted Client Hello</u></a> in 2023. And those are just a sample of the launches.</p><p>This year’s themes focused on helping prepare the Internet for a new model of monetization that encourages great content to be published, fostering more opportunities to build community both inside and outside of Cloudflare, and evergreen missions like making more features available to everyone and constantly improving the speed and security of what we offer.</p><p>We shipped a lot of new things this year. In case you missed the dozens of blog posts, here is a breakdown of everything we announced during Birthday Week 2025. </p><p><b>Monday, September 22</b></p>
<div><table><thead>
  <tr>
    <th><span>What</span></th>
    <th><span>In a sentence …</span></th>
  </tr></thead>
<tbody>
  <tr>
    <td><a href="https://blog.cloudflare.com/cloudflare-1111-intern-program/?_gl=1*rxpw9t*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MTgwNzEkajI4JGwwJGgw"><span>Help build the future: announcing Cloudflare’s goal to hire 1,111 interns in 2026</span></a></td>
    <td><span>To invest in the next generation of builders, we announced our most ambitious intern program yet with a goal to hire 1,111 interns in 2026.</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/supporting-the-future-of-the-open-web/?_gl=1*1l701kl*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MTg0MDMkajYwJGwwJGgw"><span>Supporting the future of the open web: Cloudflare is sponsoring Ladybird and Omarchy</span></a></td>
    <td><span>To support a diverse and open Internet, we are now sponsoring Ladybird (an independent browser) and Omarchy (an open-source Linux distribution and developer environment).</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/new-hubs-for-startups/?_gl=1*s35rml*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MTg2NjEkajYwJGwwJGgw/"><span>Come build with us: Cloudflare’s new hubs for startups</span></a></td>
    <td><span>We are opening our office doors in four major cities (San Francisco, Austin, London, and Lisbon) as free hubs for startups to collaborate and connect with the builder community.</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/ai-crawl-control-for-project-galileo/?_gl=1*n9jmji*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MTg2ODUkajM2JGwwJGgw"><span>Free access to Cloudflare developer services for non-profit and civil society organizations</span></a></td>
    <td><span>We extended our Cloudflare for Startups program to non-profits and public-interest organizations, offering free credits for our developer tools.</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/workers-for-students/?_gl=1*lq39wt*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MTg3NDgkajYwJGwwJGgw"><span>Introducing free access to Cloudflare developer features for students</span></a></td>
    <td><span>We are removing cost as a barrier for the next generation by giving students with .edu emails 12 months of free access to our paid developer platform features.</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/capnweb-javascript-rpc-library/?_gl=1*19mcm4k*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MjA2MTgkajYwJGwwJGgw"><span>Cap’n Web: a new RPC system for browsers and web servers</span></a></td>
    <td><span>We open-sourced Cap'n Web, a new JavaScript-native RPC protocol that simplifies powerful, schema-free communication for web applications.</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/workers-launchpad-006/?_gl=1*8z9nf6*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MjA3MTckajUwJGwwJGgw"><span>A lookback at Workers Launchpad and a warm welcome to Cohort #6</span></a></td>
    <td><span>We announced Cohort #6 of the Workers Launchpad, our accelerator program for startups building on Cloudflare.</span></td>
  </tr>
</tbody></table></div><p><b>Tuesday, September 23</b></p>
<div><table><thead>
  <tr>
    <th><span>What</span></th>
    <th><span>In a sentence …</span></th>
  </tr></thead>
<tbody>
  <tr>
    <td><a href="https://blog.cloudflare.com/per-customer-bot-defenses/?_gl=1*1i1oipn*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MjA3NjAkajckbDAkaDA./"><span>Building unique, per-customer defenses against advanced bot threats in the AI era</span></a></td>
    <td><span>New anomaly detection system that uses machine learning trained on each zone to build defenses against AI-driven bot attacks. </span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/cloudflare-astro-tanstack/?_gl=1*v1uhzx*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MjE2MzckajYwJGwwJGgw"><span>Why Cloudflare, Netlify, and Webflow are collaborating to support Open Source tools</span></a></td>
    <td><span>To support the open web, we joined forces with Webflow to sponsor Astro, and with Netlify to sponsor TanStack.</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/x402/?_gl=1*kizcyy*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MjA5OTUkajYkbDAkaDA./"><span>Launching the x402 Foundation with Coinbase, and support for x402 transactions</span></a></td>
    <td><span>We are partnering with Coinbase to create the x402 Foundation, encouraging the adoption of the </span><a href="https://github.com/coinbase/x402?cf_target_id=4D4A124640BFF471F5B56706F9A86B34"><span>x402 protocol</span></a><span> to allow clients and services to exchange value on the web using a common language</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/ai-crawl-control-for-project-galileo/?_gl=1*1r1zsjt*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MjE3NjYkajYwJGwwJGgw"><span>Helping protect journalists and local news from AI crawlers with Project Galileo</span></a></td>
    <td><span>We are extending our free Bot Management and AI Crawl Control services to journalists and news organizations through Project Galileo.</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/confidence-score-rubric/"><span>Cloudflare Confidence Scorecards - making AI safer for the Internet</span></a></td>
    <td><span>Automated evaluation of AI and SaaS tools, helping organizations to embrace AI without compromising security.</span></td>
  </tr>
</tbody></table></div><p><b>Wednesday, September 24</b></p>
<div><table><thead>
  <tr>
    <th><span>What</span></th>
    <th><span>In a sentence …</span></th>
  </tr></thead>
<tbody>
  <tr>
    <td><a href="https://blog.cloudflare.com/automatically-secure/?_gl=1*8mjfiy*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MjE4MTckajkkbDAkaDA."><span>Automatically Secure: how we upgraded 6,000,000 domains by default</span></a></td>
    <td><span>Our Automatic SSL/TLS system has upgraded over 6 million domains to more secure encryption modes by default and will soon automatically enable post-quantum connections.</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/content-signals-policy/?_gl=1*lfy031*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MjE5NTkkajYwJGwwJGgw/"><span>Giving users choice with Cloudflare’s new Content Signals Policy</span></a></td>
    <td><span>The Content Signals Policy is a new standard for robots.txt that lets creators express clear preferences for how AI can use their content.</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/building-a-better-internet-with-responsible-ai-bot-principles/?_gl=1*hjo4nx*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MjIwMTIkajckbDAkaDA."><span>To build a better Internet in the age of AI, we need responsible AI bot principles</span></a></td>
    <td><span>A proposed set of responsible AI bot principles to start a conversation around transparency and respect for content creators' preferences.</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/saas-to-saas-security/?_gl=1*tigi23*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MjIwNjgkajYwJGwwJGgw"><span>Securing data in SaaS to SaaS applications</span></a></td>
    <td><span>New security tools to give companies visibility and control over data flowing between SaaS applications.</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/post-quantum-warp/?_gl=1*1vy23vv*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MjIyMDIkajYwJGwwJGgw"><span>Securing today for the quantum future: WARP client now supports post-quantum cryptography (PQC)</span></a></td>
    <td><span>Cloudflare’s WARP client now supports post-quantum cryptography, providing quantum-resistant encryption for traffic. </span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/a-simpler-path-to-a-safer-internet-an-update-to-our-csam-scanning-tool/?_gl=1*1avvoeq*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MjIxMTUkajEzJGwwJGgw"><span>A simpler path to a safer Internet: an update to our CSAM scanning tool</span></a></td>
    <td><span>We made our CSAM Scanning Tool easier to adopt by removing the need to create and provide unique credentials, helping more site owners protect their platforms.</span></td>
  </tr>
</tbody></table></div><p>
<b>Thursday, September 25</b></p>
<div><table><thead>
  <tr>
    <th><span>What</span></th>
    <th><span>In a sentence …</span></th>
  </tr></thead>
<tbody>
  <tr>
    <td><a href="https://blog.cloudflare.com/enterprise-grade-features-for-all/?_gl=1*ll2laa*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MjIyODIkajYwJGwwJGgw/"><span>Every Cloudflare feature, available to everyone</span></a></td>
    <td><span>We are making every Cloudflare feature, starting with Single Sign On (SSO), available for anyone to purchase on any plan. </span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/cloudflare-developer-platform-keeps-getting-better-faster-and-more-powerful/?_gl=1*1dwrmxx*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MjI0MzgkajYwJGwwJGgw/"><span>Cloudflare's developer platform keeps getting better, faster, and more powerful</span></a></td>
    <td><span>Updates across Workers and beyond for a more powerful developer platform – such as support for larger and more concurrent Container images, support for external models from OpenAI and Anthropic in AI Search (previously AutoRAG), and more. </span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/planetscale-postgres-workers/?_gl=1*1e87q21*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MjI2MDUkajYwJGwwJGgw"><span>Partnering to make full-stack fast: deploy PlanetScale databases directly from Workers</span></a></td>
    <td><span>You can now connect Cloudflare Workers to PlanetScale databases directly, with connections automatically optimized by Hyperdrive.</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/cloudflare-data-platform/?_gl=1*1gj7lyv*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MjI5MDckajYwJGwwJGgw"><span>Announcing the Cloudflare Data Platform</span></a></td>
    <td><span>A complete solution for ingesting, storing, and querying analytical data tables using open standards like Apache Iceberg. </span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/r2-sql-deep-dive/?_gl=1*88kngf*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MjI5MzAkajM3JGwwJGgw"><span>R2 SQL: a deep dive into our new distributed query engine</span></a></td>
    <td><span>A technical deep dive on R2 SQL, a serverless query engine for petabyte-scale datasets in R2.</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/safe-in-the-sandbox-security-hardening-for-cloudflare-workers/?_gl=1*y25my1*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MjI4ODQkajMkbDAkaDA./"><span>Safe in the sandbox: security hardening for Cloudflare Workers</span></a></td>
    <td><span>A deep-dive into how we’ve hardened the Workers runtime with new defense-in-depth security measures, including V8 sandboxes and hardware-assisted memory protection keys.</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/sovereign-ai-and-choice/?_gl=1*1gvqucw*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MjI4NjkkajE4JGwwJGgw/"><span>Choice: the path to AI sovereignty</span></a></td>
    <td><span>To champion AI sovereignty, we've added locally-developed open-source models from India, Japan, and Southeast Asia to our Workers AI platform.</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/email-service/?_gl=1*z3yus0*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MjI4MjckajYwJGwwJGgw"><span>Announcing Cloudflare Email Service’s private beta</span></a></td>
    <td><span>We announced the Cloudflare Email Service private beta, allowing developers to reliably send and receive transactional emails directly from Cloudflare Workers.</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/nodejs-workers-2025/?_gl=1*gzumry*_gcl_aw*R0NMLjE3NTg5MTQ0ODEuQ2p3S0NBanc4OWpHQmhCMEVpd0EybzFPbnp1VkVIN2UybUZJcERvWWtJMV9Rc2FlbTFEV19FU19qVjR1QnVmcEE3QVdkeU9zaVRIZGl4b0N4dHNRQXZEX0J3RQ..*_gcl_dc*R0NMLjE3NTgyMDc1NDEuQ2owS0NRancyNjdHQmhDU0FSSXNBT2pWSjRIWTFOVTZVWDFyVEJVNGNyd243d3RwX3lheFBuNnZJdXJlOUVmWmRzWkJJa1ZyejF4cDFDSWFBa2pBRUFMd193Y0I.*_gcl_au*MTI5NDk3ODE3OC4xNzUzMTQwMzIw*_ga*ZTI0NWUyMDQtZDM1YS00NTFkLWIwM2UtYjhhNzliZWQxY2Nj*_ga_SQCRB0TXZW*czE3NTg5MTY5NDEkbzYkZzEkdDE3NTg5MjI2ODgkajYwJGwwJGgw/"><span>A year of improving Node.js compatibility in Cloudflare Workers</span></a></td>
    <td><span>There are hundreds of new Node.js APIs now available that make it easier to run existing Node.js code on our platform. </span></td>
  </tr>
</tbody></table></div><p><b>Friday, September 26</b></p>
<table><thead>
  <tr>
    <th><span>What</span></th>
    <th><span>In a sentence …</span></th>
  </tr></thead>
<tbody>
  <tr>
    <td><a href="https://blog.cloudflare.com/20-percent-internet-upgrade"><span>Cloudflare just got faster and more secure, powered by Rust</span></a></td>
    <td><span>We have re-engineered our core proxy with a new modular, Rust-based architecture, cutting median response time by 10ms for millions. </span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com//introducing-observatory-and-smart-shield/"><span>Introducing Observatory and Smart Shield</span></a></td>
    <td><span>New monitoring tools in the Cloudflare dashboard that provide actionable recommendations and one-click fixes for performance issues.</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/monitoring-as-sets-and-why-they-matter/"><span>Monitoring AS-SETs and why they matter</span></a></td>
    <td><span>Cloudflare Radar now includes Internet Routing Registry (IRR) data, allowing network operators to monitor AS-SETs to help prevent route leaks.</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/an-ai-index-for-all-our-customers"><span>An AI Index for all our customers</span></a></td>
    <td><span>We announced the private beta of AI Index, a new service that creates an AI-optimized search index for your domain that you control and can monetize.</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/new-regional-internet-traffic-and-certificate-transparency-insights-on-radar/"><span>Introducing new regional Internet traffic and Certificate Transparency insights on Cloudflare Radar</span></a></td>
    <td><span>Sub-national traffic insights and Certificate Transparency dashboards for TLS monitoring.</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/eliminating-cold-starts-2-shard-and-conquer/"><span>Eliminating Cold Starts 2: shard and conquer</span></a></td>
    <td><span>We have reduced Workers cold starts by 10x by implementing a new "worker sharding" system that routes requests to already-loaded Workers.</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/network-performance-update-birthday-week-2025/"><span>Network performance update: Birthday Week 2025</span></a></td>
    <td><span>The TCP Connection Time (Trimean) graph shows that we are the fastest TCP connection time in 40% of measured ISPs – and the fastest across the top networks.</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/how-cloudflare-uses-the-worlds-greatest-collection-of-performance-data/"><span>How Cloudflare uses performance data to make the world’s fastest global network even faster</span></a></td>
    <td><span>We are using our network's vast performance data to tune congestion control algorithms, improving speeds by an average of 10% for QUIC traffic.</span></td>
  </tr>
  <tr>
    <td><a href="https://blog.cloudflare.com/code-mode/"><span>Code Mode: the better way to use MCP</span></a></td>
    <td><span>It turns out we've all been using MCP wrong. Most agents today use MCP by exposing the "tools" directly to the LLM. We tried something different: Convert the MCP tools into a TypeScript API, and then ask an LLM to write code that calls that API. The results are striking.</span></td>
  </tr>
</tbody></table>
    <div>
      <h3>Come build with us!</h3>
      <a href="#come-build-with-us">
        
      </a>
    </div>
    <p>Helping build a better Internet has always been about more than just technology. Like the announcements about interns or working together in our offices, the community of people behind helping build a better Internet matters to its future. This week, we rolled out our most ambitious set of initiatives ever to support the builders, founders, and students who are creating the future.</p><p>For founders and startups, we are thrilled to welcome <b>Cohort #6</b> to the <b>Workers Launchpad</b>, our accelerator program that gives early-stage companies the resources they need to scale. But we’re not stopping there. We’re opening our doors, literally, by launching <b>new physical hubs for startups</b> in our San Francisco, Austin, London, and Lisbon offices. These spaces will provide access to mentorship, resources, and a community of fellow builders.</p><p>We’re also investing in the next generation of talent. We announced <b>free access to the Cloudflare developer platform for all students</b>, giving them the tools to learn and experiment without limits. To provide a path from the classroom to the industry, we also announced our goal to hire <b>1,111 interns in 2026</b> — our biggest commitment yet to fostering future tech leaders.</p><p>And because a better Internet is for everyone, we’re extending our support to <b>non-profits and public-interest organizations</b>, offering them free access to our production-grade developer tools, so they can focus on their missions.</p><p>Whether you're a founder with a big idea, a student just getting started, or a team working for a cause you believe in, we want to help you succeed.</p>
    <div>
      <h3>Until next year</h3>
      <a href="#until-next-year">
        
      </a>
    </div>
    <p>Thank you to our customers, our community, and the millions of developers who trust us to help them build, secure, and accelerate the Internet. Your curiosity and feedback drive our innovation.</p><p>It’s been an incredible 15 years. And as always, we’re just getting started!</p><p><i>(Watch the full conversation on our show </i><a href="ThisWeekinNET.com"><i>ThisWeekinNET.com</i></a><i> about what we launched during Birthday Week 2025 </i><a href="https://youtu.be/Z2uHFc9ua9s?feature=shared"><i><b><u>here</u></b></i></a><i>.) </i></p> ]]></content:encoded>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[Partners]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Workers Launchpad]]></category>
            <category><![CDATA[Performance]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Cache]]></category>
            <category><![CDATA[Speed]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[1.1.1.1]]></category>
            <category><![CDATA[Application Security]]></category>
            <category><![CDATA[Application Services]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[CDN]]></category>
            <category><![CDATA[Cloudflare for Startups]]></category>
            <category><![CDATA[Cloudflare One]]></category>
            <category><![CDATA[Cloudflare Zero Trust]]></category>
            <category><![CDATA[Cloudflare Workers]]></category>
            <guid isPermaLink="false">4k1NhJtljIsH7GOkpHg1Ei</guid>
            <dc:creator>Nikita Cano</dc:creator>
            <dc:creator>Korinne Alpers</dc:creator>
        </item>
        <item>
            <title><![CDATA[Building unique, per-customer defenses against advanced bot threats in the AI era]]></title>
            <link>https://blog.cloudflare.com/per-customer-bot-defenses/</link>
            <pubDate>Tue, 23 Sep 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Today, we are announcing a new approach to catching bots: using models to provide behavioral anomaly detection unique to each bot management customer and stop sophisticated bot attacks.  ]]></description>
            <content:encoded><![CDATA[ <p>Today, we are announcing a new approach to catching bots: using models to provide <b>behavioral anomaly detection </b><b><i>unique to each bot management customer</i></b> and stop sophisticated bot attacks. </p><p>With this per-customer approach, we’re giving every bot management customer hyper-personalized security capabilities to stop even the sneakiest bots. We’re doing this by not only making a first-request judgement call, but also by tracking behavior of bots who play the long-game and continuously execute unwanted behavior on our customers’ websites. We want to share how this service works, and where we’re focused. Our new platform has the power to fuel hundreds of thousands of unique detection suites, and we’ve heard our first target loud and clear from site owners: <a href="https://www.cloudflare.com/the-net/building-cyber-resilience/regain-control-ai-crawlers/"><u>protect websites</u></a> from the explosion of sophisticated, AI-driven web scraping.</p>
    <div>
      <h2>The new arms race: the rise of AI-driven scraping</h2>
      <a href="#the-new-arms-race-the-rise-of-ai-driven-scraping">
        
      </a>
    </div>
    <p>The battle against malicious bots used to be a simpler affair. Attackers used scripts that were fairly easy to identify through static, predictable signals: a request with a missing User-Agent header, a malformed method name, or traffic from a non-standard port was a clear indicator of malicious intent. However, the Internet is always evolving. As websites became more dynamic to create rich user experiences, attackers evolved their tools in response. The simple scripts of yesterday were replaced by headless browsers and automation frameworks, capable of rendering pages and mimicking human interaction with far greater fidelity.</p><p>AI has made this even trickier. The rise of <a href="https://www.cloudflare.com/learning/ai/what-is-generative-ai/"><u>Generative AI</u></a> has fundamentally changed the capabilities and the motivations of attackers. The web scraping of today isn’t limited to competitive price intelligence or content aggregation, but driven by the voracious appetite of <a href="https://www.cloudflare.com/learning/ai/what-is-large-language-model/"><u>Large Language Models (LLMs)</u></a> for training data.</p><p>Cloudflare’s data shows this shift in stark terms. In mid-2025, <a href="https://radar.cloudflare.com/ai-insights?dateStart=2025-07-01&amp;dateEnd=2025-07-07#crawl-purpose"><b><u>crawling for the purpose of AI model training accounted for nearly 80% of all AI bot activity</u></b></a> on our network, a significant increase from the year prior. Modern scraping tools are now AI-powered themselves. They leverage LLMs for semantic understanding of page content, use computer vision to solve visual challenges, and employ reinforcement learning to navigate complex websites they’ve never seen before. The evolution of these bots exposes critical vulnerability in the traditional, one-size-fits-all approach to security. While global threat intelligence is immensely powerful for stopping widespread attacks, these new <b>AI-powered scrapers are designed to blend in</b>. They can rotate IP addresses through residential proxies, generate human-like user agents, and mimic plausible browsing patterns. A request from one of these bots might not look anomalous when compared to the trillions of requests we see across the Cloudflare network, but would appear anomalous when compared to the established patterns of legitimate users on a specific website. This means we need to build defenses against these bots from every angle we have — from the global view to specific behavior on a single application. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3muiMDClrUwUrh5yoDbqlv/9df48cc59dcefed98b16b7df7f72fbd6/image3.png" />
          </figure>
    <div>
      <h2>Globally scalable bot fingerprinting</h2>
      <a href="#globally-scalable-bot-fingerprinting">
        
      </a>
    </div>
    <p>To target specific well-known bots or bot actors, we leverage the Cloudflare network to fingerprint bots that we see behave similarly across millions of websites. Since June, Cloudflare’s bot detection security analysts have written <b>50 heuristics</b> to catch bots using a variety of signals, including but not limited to <b>HTTP/2 fingerprints</b> and <b>Client Hello extensions. </b>By observing traffic on millions of websites, we establish a baseline of legitimate fingerprints of common browsers and benign devices. When a new, unique fingerprint suddenly appears across many different sites, it's a tell-tale sign of a distributed botnet or a new automation tool, allowing our analysts to block the bot's signature itself and neutralize the entire campaign, regardless of the thousands of different IP addresses it might use.</p><p>Recently, we also introduced <a href="https://developers.cloudflare.com/bots/additional-configurations/detection-ids/#additional-detections"><b><u>detection improvements to tackle residential proxy networks</u></b></a> and similar commercial proxies, which are used by attackers to make their bots appear as thousands of distinct real visitors, allowing them to bypass traditional security measures. The superpower of this detection improvement? Combining the vast amount of network data we see with particular client-side fingerprints obtained through the millions of challenge solves that happen across the Internet daily. <a href="https://developers.cloudflare.com/cloudflare-challenges/"><u>Challenges</u></a> have always served as an ideal mitigation action for customers who want to protect their applications without compromising real-user experience, but now they also serve as a gift that keeps on giving: in this case, <b><i>feeding the Cloudflare threat detection teams a constant stream of client-side information</i></b> that allows us to pattern match to determine IP addresses that are used by residential proxy networks.</p><p>This detection improvement is already ingesting data from the entire Cloudflare network, automatically catching more malicious traffic for all customers using <a href="https://developers.cloudflare.com/bots/get-started/super-bot-fight-mode/"><u>Super Bot Fight Mode</u></a> (bot protection included for Pro, Business, and all Enterprise customers) and <a href="https://developers.cloudflare.com/bots/get-started/bot-management/"><u>Enterprise Bot Management</u></a>. Examining 7 days of data from the time of authoring this post, we’ve observed <b>11 billion requests</b> from millions of unique IP addresses that we’ve identified as connected to residential or commercial proxy networks. This is just one piece of the global detection puzzle; the existing <a href="https://blog.cloudflare.com/residential-proxy-bot-detection-using-machine-learning/"><u>residential proxy detection features in our ML</u></a><b> </b>already catch <i>tens of millions of requests every hour</i>. </p>
    <div>
      <h2>Hyper-personalized security: learning what's normal for <i>you</i></h2>
      <a href="#hyper-personalized-security-learning-whats-normal-for-you">
        
      </a>
    </div>
    <p>The new arms race against AI-powered bots necessitates a closer look — something more precise. For instance, a script that systematically scrapes every user profile on a social media site, or every product listing on an e-commerce platform, is exhibiting behavior that is fundamentally abnormal for <i>that application</i>, even if a standalone request appears benign. This realization is at the heart of our new strategy: to win this new arms race, defenses must become as bespoke and adaptive as the attacks they face.</p><p>To meet this challenge, we built a new, foundational platform engineered to deploy custom <a href="https://www.cloudflare.com/learning/ai/what-is-machine-learning/"><u>machine learning models</u></a> for every bot management customer. We’re creating a unique defense for every application. Because each website has different traffic, the traffic that we flag as anomalous will, of course, be different for each zone — for this system, we want to be clear that data from one customer’s zone won’t be used to train the model for another customer’s use.</p><p>Announcing this as a new platform capability, rather than a single feature, is a deliberate choice. It aligns with how we’ve approached our most significant innovations, from <a href="https://www.cloudflare.com/developer-platform/products/workers/"><u>Cloudflare Workers</u></a> changing how developers build applications, to <a href="https://www.cloudflare.com/developer-platform/products/ai-gateway/"><u>AI Gateway</u></a> creating a single control plane for AI observability and security. By focusing on the platform, we tackle the <a href="https://www.cloudflare.com/learning/ai/how-to-prevent-web-scraping/">scraping problems</a> our customers are seeing today <i>and</i> power future detections as bot attacks become increasingly sophisticated.</p><p>Our new generation of per-customer anomaly detection is a three-step process, designed to identify malicious behavior by first understanding what constitutes legitimate traffic for each individual website and API.</p>
    <div>
      <h3>Step 1: Establishing a dynamic baseline</h3>
      <a href="#step-1-establishing-a-dynamic-baseline">
        
      </a>
    </div>
    <p>For each customer zone, our behavioral detections ingest traffic data to build a baseline of normal activity. Rather than taking a static snapshot, our new platform ingests data to make living, continuously updated calculations of what “normal” looks like on a specific website. This approach understands seasonality, recognizes traffic spikes from legitimate marketing campaigns, and maps the typical pathways users take through a site. This approach evolves the concept of Anomaly Detection already present in our Enterprise Bot Management suite, but applies it at a far more granular and dynamic per-customer level.</p>
    <div>
      <h3>Step 2: Identifying the anomalies</h3>
      <a href="#step-2-identifying-the-anomalies">
        
      </a>
    </div>
    <p>Once the baseline of "normal" is established, we begin the true work — identifying deviations. Because the baseline is specific to each website, the anomalies detected are highly contextual, perhaps even invisible to a global system. We can examine a few different types of websites to unpack this:</p><ul><li><p><b>For a gaming company:</b> A normal traffic baseline might show millions of users making frequent, rapid API calls to a matchmaking service or an in-game inventory system. A behavioral detection model trained on this baseline would immediately flag a single user making slow, methodical, sequential API calls to scrape the entire player leaderboard. This behavior, while low in volume, is a clear anomaly against the backdrop of normal gameplay patterns.</p></li><li><p><b>For a retail website:</b> The normal baseline is a complex funnel of users browsing categories, viewing products, adding items to a cart, and proceeding to checkout. These detections would identify an actor that systematically visits every single product page in alphabetical order at a machine-like pace, without ever interacting with the cart or session cookies, as a significant anomaly indicative of <a href="https://www.cloudflare.com/learning/bots/what-is-content-scraping/"><u>content scraping</u></a>.</p></li><li><p><b>For a media publisher:</b> Normal user behavior involves reading a few articles, following internal links, and spending a measurable amount of time on each page. An anomaly would be a script that hits thousands of article URLs per minute, spending less than a second on each, purely to extract the text content for AI model training.</p></li></ul><p>In each case, the malicious activity is defined not by a universal signature, but <b><i>by its deviation from the application's unique, established norm</i></b>.</p>
    <div>
      <h3>Step 3: Generating actionable findings</h3>
      <a href="#step-3-generating-actionable-findings">
        
      </a>
    </div>
    <p>Detecting an anomaly is only half the battle. The power of bot management comes from its seamless integration into the Cloudflare security ecosystem you already use, turning detection into immediate, actionable findings. Customers can benefit from these behavioral detection improvements in two ways:</p><ol><li><p><b>New Bot Detection IDs: </b>For our Enterprise customers, we’re introducing a new set of <a href="https://developers.cloudflare.com/bots/additional-configurations/detection-ids/"><u>Bot Detection IDs</u></a>. Website owners and security teams can write WAF security rules to challenge, rate-limit, or block traffic based on the specific anomalies flagged by these detections. Since each detection type is tied to a unique ID, customers can see exactly what kind of behavior caused a request to be flagged as anomalous, offering a detailed, per-request view into stealthy malicious traffic. And for a wider view, customers can filter by Detection ID from their Security Analytics, to see the bigger picture of all traffic captured by that detection type.</p></li><li><p><b>Improving Bot Score:</b> Another key output from these new, per-customer models will be to directly influence the Bot Score of a request. A request flagged as anomalous will have its score lowered, moving it into the "Likely Automated" (scores 2-29) or "Automated" (score 1) categories. This means that existing WAF custom rules based on Bot Score will automatically see impact and become more effective against bespoke attacks, with no changes required. This functionality update is available today for our latest <a href="https://developers.cloudflare.com/bots/additional-configurations/detection-ids/#account-takeover-detections"><u>account takeover detection</u></a>, <a href="https://blog.cloudflare.com/residential-proxy-bot-detection-using-machine-learning/"><u>residential proxy detections</u></a> and our recent <a href="https://developers.cloudflare.com/bots/additional-configurations/detection-ids/#additional-detections"><u>enhancements</u></a>, and will be implemented in the future for our behavioral scraping detection. </p></li></ol><p>This three-step process is already in action with our behavioral detections to catch <a href="https://developers.cloudflare.com/bots/additional-configurations/detection-ids/#account-takeover-detections"><u>account takeover</u></a> attacks. Taking bot detection ID 201326598 as an example: it (1) establishes a zone-level baseline that understands what normal traffic patterns look like for a specific website, (2) examines anomalous login failures to identify brute force and credential stuffing attacks, then (3) allows customers to mitigate these attacks by automatically influencing bot score <i>and</i> offering more visibility with the detection ID’s analytics. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5w8HUyr51JD8K4EYT7teeL/ed825aa96c3ae1809199d32734f0e60d/image4.png" />
          </figure><p>This integration strategy creates a flywheel effect: the new intelligence from these improved detections immediately enhances the value of existing products like Super Bot Fight Mode, Bot Management, and the WAF, making the entire Cloudflare platform stronger for you.</p>
    <div>
      <h2>Taking on sophisticated scrapers</h2>
      <a href="#taking-on-sophisticated-scrapers">
        
      </a>
    </div>
    <p>The first challenge we’re tackling is sophisticated scraping. AI-driven scraping is one of the most pressing and rapidly evolving threats facing website owners today, and its adaptive nature makes it an ideal adversary for a system designed to fight an enemy that constantly changes its tactics.</p><p>The first generation of our improved behavioral detections are tuned specifically to detect scraping by analyzing signals that go beyond simple request headers. These include:</p><ul><li><p><b>Behavioral Analysis:</b> Looking at session traversal paths, the sequence of requests, and interaction (or lack thereof) with dynamic page elements.</p></li><li><p><b>Client Fingerprinting:</b> Analyzing subtle signals from the client to identify signs of automation such as JA4 fingerprints in the context of the customer's specific traffic baseline.</p></li><li><p><b>Content-Agnostic Detection:</b> These models do not need to understand the content of a page, only the patterns of how it is being accessed. This makes them highly scalable and efficient, without actually using the unique content on a website to make judgement calls.</p></li></ul><p>How do these scraping detections look, in practice? We validated our logic for detecting scraping with early adopters in a closed beta, in order to receive ground-truth feedback and tune our detections. As with any ideal detection, our goal is to capture as much malicious traffic as possible, without compromising the experience of legitimate website visitors. Looking at just a 24-hour period, our new scraping detections have caught hundreds of millions of requests, flagging <b>138 million scraping requests on just 5 of our early beta zones</b>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3dmVkAJR9ELqrGMFR4tbcI/732bbb2477c350ec97d8fcd70d57b782/image2.png" />
          </figure><p>Naturally, we see an overlap with our existing system of bot scoring, but the numbers here show us concretely that our new method of behavioral detections have a completely new value add: <b>34% of the requests flagged by our new scraping detections would not have been detected by our existing bot score system</b>, making us all the more eager to use these novel detections to inform the way we score automation.</p>
    <div>
      <h2>A birthday gift for the Internet</h2>
      <a href="#a-birthday-gift-for-the-internet">
        
      </a>
    </div>
    <p>Our mission to help build a better Internet means that when we develop powerful new defenses, we believe in democratizing access to them. Protecting the entire Internet from new and evolving threats requires raising the baseline of security for everyone.</p><p>In that spirit, we’re excited to announce that our enhanced behavioral detections will not only roll out to bot management customers, but will also benefit Cloudflare customers using our global Super Bot Fight Mode<b> </b>system. For our Enterprise Bot Management customers, we automatically tune our detections based on the exact traffic for each zone. Because these advanced models are trained on your zone’s specific traffic, they detect even the most evasive attacks: from account takeovers to web scraping to other attacks executed through residential proxy networks — and we consider this only the tip of the iceberg of behavioral bot profiling. </p>
    <div>
      <h2>The road ahead</h2>
      <a href="#the-road-ahead">
        
      </a>
    </div>
    <p>Our initial focus on scraping is just the beginning of a new wave of behavioral bot detections. The infrastructure we’ve built is a flexible, powerful foundation for tackling a wide range of malicious behavior on your websites; the same principles of establishing a per-customer baseline and detecting anomalies can be applied to other critical threats that are unique to an application's logic, such as credential stuffing, inventory hoarding, carding attacks, and API abuse.</p><p>We are moving into an era where generic defenses are no longer enough. As threats become more personal, so must the defenses against them, and paving this path of behavioral detections is our latest gift to the Internet. Our first offering of scraping behavioral detections is just around the corner: customers will be able to turn on this new detection from the <a href="https://dash.cloudflare.com/?to=/:account/:zone/security/overview"><u>Security Overview</u></a> page in their dashboard. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/9EW8B0vJ43k28c5USM5Ho/6a180ca73844c7432749ca36a12684aa/image5.png" />
          </figure><p>(We’re always looking for enthusiastic humans to help us in our mission against bots! If you’re interested in helping us build a better Internet, check out our <a href="https://www.cloudflare.com/careers/jobs/"><u>open positions.</u></a>)</p> ]]></content:encoded>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[Bot Management]]></category>
            <guid isPermaLink="false">1l4pM7l0pDUGAgKypKgs15</guid>
            <dc:creator>Jin-Hee Lee</dc:creator>
            <dc:creator>Oliver Payne</dc:creator>
            <dc:creator>Bob AminAzad</dc:creator>
            <dc:creator>Viktor Chynarov</dc:creator>
            <dc:creator>Aleksandar Pavlov Hrusanov</dc:creator>
            <dc:creator>Prajjwal Gupta</dc:creator>
        </item>
        <item>
            <title><![CDATA[The crawl-to-click gap: Cloudflare data on AI bots, training, and referrals]]></title>
            <link>https://blog.cloudflare.com/crawlers-click-ai-bots-training/</link>
            <pubDate>Fri, 29 Aug 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ By mid-2025, training drives nearly 80% of AI crawling, while referrals to publishers (especially from Google) are falling and crawl-to-refer ratios show AI consumes far more than it sends back. ]]></description>
            <content:encoded><![CDATA[ <p>In 2025, Generative AI is reshaping how people and companies use the Internet. Search engines once drove traffic to content creators through links. Now, AI training crawlers — the engines behind commonly-used LLMs — are consuming vast amounts of web data, while sending far fewer users back. We covered this shift, along with related <a href="https://blog.cloudflare.com/from-googlebot-to-gptbot-whos-crawling-your-site-in-2025/"><u>trends</u></a> and Cloudflare <a href="https://blog.cloudflare.com/tag/pay-per-crawl/"><u>features</u></a> (like pay per crawl) in early July. Studies from Pew Research Center (<a href="https://www.pewresearch.org/short-reads/2025/04/28/americans-largely-foresee-ai-having-negative-effects-on-news-journalists/"><u>1</u></a>, <a href="https://www.pewresearch.org/short-reads/2025/07/22/google-users-are-less-likely-to-click-on-links-when-an-ai-summary-appears-in-the-results/"><u>2</u></a>) and <a href="https://pressgazette.co.uk/media-audience-and-business-data/google-ai-overviews-publishers-report-clickthroughs-authoritas-report/"><u>Authoritas</u></a> already point to AI overviews — Google’s new AI-generated summaries shown at the top of search results — contributing to sharp declines in news website traffic. For a news site, this means lots of bot hits, but far fewer real readers clicking through — which in turn means fewer people clicking on ads or chances to convert to subscriptions.</p><p>Cloudflare's data shows the same pattern. Crawling by search engines and AI services surged in the first half of 2025 — up 24% year-over-year in June — before slowing to just 4% year-over-year growth in July. How is the space evolving? Which crawling purposes are most common, and how is that changing? Spoiler: training-related crawling is leading the way. In this post, we track AI and search bot crawl activity, what purposes dominate, and which platforms contribute the least referral traffic back to creators.</p>
    <div>
      <h3>Key takeaways</h3>
      <a href="#key-takeaways">
        
      </a>
    </div>
    <ul><li><p>Training crawling grows: Training now drives nearly 80% of AI bot activity, up from 72% a year ago.</p></li><li><p>Publisher referrals drop: Google referrals to news sites fell, with March 2025 down ~9% compared to January.</p></li><li><p>AI &amp; search crawling increase: Crawling rose 32% year-over-year in April 2025, before slowing to 4% year-over-year growth in July.</p></li><li><p>AI-only crawler shifts: OpenAI’s GPTBot more than doubled in share of AI crawling traffic (4.7% to 11.7%), Anthropic’s ClaudeBot rose (6% to ~10%), while ByteDance’s Bytespider fell from 14.1% to 2.4%.</p></li><li><p>Crawl-to-refer imbalance (how many pages a bot crawls per page that a user clicks back to): Anthropic increased referrals but still leads with 38,000 crawls per visitor in July (down from 286,000:1 in January). Perplexity decreased referrals in 2025 — with more crawling but fewer referrals at 194 crawls per visitor in July.</p></li></ul><p>Several of the trends in this blog use <a href="https://radar.cloudflare.com/ai-insights"><u>Cloudflare Radar’s new AI Insights</u></a> features, explained in more detail in the post: “<a href="http://blog.cloudflare.com/ai-crawler-traffic-by-purpose-and-industry"><b><u>A deeper look at AI crawlers: breaking down traffic by purpose and industry</u></b></a>.”</p>
    <div>
      <h2>Google referrals fall as AI Overviews expand</h2>
      <a href="#google-referrals-fall-as-ai-overviews-expand">
        
      </a>
    </div>
    <p>Referral traffic from search is already shifting, as we noted above and as <a href="http://studies"><u>studies</u></a> have shown. In our dataset of news-related customers (spanning the Americas, Europe, and Asia), Google’s referrals have been clearly declining since February 2025. This drop is unusual, since overall Internet traffic (and referrals as well) historically has only dipped during July and August — the summer months when the Northern Hemisphere is largely on break from school or work. The sharpest and least seasonal decline came in March. Despite being a 31-day month, March had almost the same referral volume as the shorter, 28-day February.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1ZWlDsTAtPveEo2Kq8nzu9/ebd655d9ea51f35cfae1f4d09cfecc76/1.png" />
          </figure><p>Looking at longer comparisons: March 2025 referral traffic from Google was 9% lower than January, the same drop seen in June. April was worse, down 15% compared with January.</p><p>This drop seems to coincide with some of Google’s changes. AI Overviews launched in the U.S. in <a href="https://blog.google/products/search/generative-ai-google-search-may-2024/"><u>May 2024</u></a>, but in March 2025, Google upgraded AI Overviews with Gemini 2.0, introduced AI Mode in Labs, and <a href="https://blog.google/feed/were-bringing-the-helpfulness-of-ai-overviews-to-more-countries-in-europe/"><u>expanded</u></a> Overviews to more European countries. By May 2025, AI Mode rolled out broadly in the U.S. with Gemini 2.5, adding conversational search, Deep Search, and personalized recommendations.</p><p>The search-to-news site pipeline seems to be weakening, replaced in part by AI-driven results.</p><p>Looking at a daily perspective, we can also spot a clear U.S.-election-related peak in referrals from Google to the cohort of known news sites on November 5–6, 2024.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1Gtq4mnTg8KdVWaUkpH51A/86e7f7dfeb31f846df4ae8486c25b4aa/2.png" />
          </figure>
    <div>
      <h2>AI and search crawling: spring surge (+24%), summer slowdown</h2>
      <a href="#ai-and-search-crawling-spring-surge-24-summer-slowdown">
        
      </a>
    </div>
    <p><a href="https://blog.cloudflare.com/from-googlebot-to-gptbot-whos-crawling-your-site-in-2025/"><u>In June</u></a>, we talked about search and AI crawler growth, and our picture of the trend is now more complete with more data. To focus only on AI and search crawlers, and to remove the bias of customer growth, we analyzed a fixed set of customers from specific weeks, a method we’ve also used in the <a href="http://radar.cloudflare.com/year-in-review/"><u>Cloudflare Radar Year in Review</u></a>.</p><p>What the data shows: crawling spiked twice: first in November 2024, then again between March and April 2025. April 2025 alone was up 32% compared with May 2024, the first full month where we have comparable data. After that surge, growth stabilized. In June 2025, crawling traffic was still 24% higher year-over-year, but by July the increase was down to just 4%. That shift highlights how quickly crawler activity can accelerate and then cool down.</p><p>As the chart below shows, crawling traffic rose sharply in March and April. It remained high but slightly lower in May, before starting to drop in June. The seasonal dip is similar to what we see in overall Internet traffic during the Northern Hemisphere’s summer months (August and September are often the quietest), though in the case of crawlers, this is likely due to reduced overall web activity rather than bots themselves taking a “break.” Historically, activity tends to rise again in November — as it did in 2024 for AI and search bot traffic — when people spend more time online for shopping and seasonal habits (a pattern we’ve seen in <a href="https://blog.cloudflare.com/from-deals-to-ddos-exploring-cyber-week-2024-internet-trends/"><u>past years</u></a>).</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1SKJcH4r7smlgCBC9vjULt/1311a9ded068a142122630af5afc3766/3.png" />
          </figure><p>Googlebot is <a href="https://blog.cloudflare.com/from-googlebot-to-gptbot-whos-crawling-your-site-in-2025/"><u>still</u></a> the anchor, accounting for 39% of all AI and search crawler traffic, but the fastest growth now comes from AI-specific crawlers, though bots related to Amazon and ByteDance (Bytespider) have lost significant ground. GPTBot’s share grew from 4.7% in July 2024 to 11.7% in July 2025. ClaudeBot also increased, from 6% to nearly 10%, while Meta’s crawler jumped from 0.9% to 7.5%. By contrast, Amazonbot dropped from 10.2% to 5.9%, and ByteDance’s Bytespider dropped from 14.1% to just 2.4%.</p><p>The table below shows how market shares have shifted between July 2024 and July 2025:</p><table><tr><td><p>
</p></td><td><p><b>Bot name</b></p></td><td><p><b>% share July 2024</b></p></td><td><p><b>% share July 2025</b></p></td><td><p><b>Δ percentage-point change</b></p></td></tr><tr><td><p><b>1</b></p></td><td><p>Googlebot</p></td><td><p>37.5</p></td><td><p>39</p></td><td><p>1.5</p></td></tr><tr><td><p><b>2</b></p></td><td><p>GPTBot</p></td><td><p>4.7</p></td><td><p>11.7</p></td><td><p>7</p></td></tr><tr><td><p><b>3</b></p></td><td><p>ClaudeBot</p></td><td><p>6</p></td><td><p>9.9</p></td><td><p>3.9</p></td></tr><tr><td><p><b>4</b></p></td><td><p>Bingbot</p></td><td><p>8.7</p></td><td><p>9.3</p></td><td><p>0.6</p></td></tr><tr><td><p><b>5</b></p></td><td><p>Meta-ExternalAgent</p></td><td><p>0.9</p></td><td><p>7.5</p></td><td><p>6.5</p></td></tr><tr><td><p><b>6</b></p></td><td><p>Amazonbot</p></td><td><p>10.2</p></td><td><p>5.9</p></td><td><p>-4.3</p></td></tr><tr><td><p><b>7</b></p></td><td><p>Googlebot-Image</p></td><td><p>4.1</p></td><td><p>3.3</p></td><td><p>-0.8</p></td></tr><tr><td><p><b>8</b></p></td><td><p>Yandex</p></td><td><p>5</p></td><td><p>2.9</p></td><td><p>-2.1</p></td></tr><tr><td><p><b>9</b></p></td><td><p>GoogleOther</p></td><td><p>4.6</p></td><td><p>2.7</p></td><td><p>-1.8</p></td></tr><tr><td><p><b>10</b></p></td><td><p>Bytespider</p></td><td><p>14.1</p></td><td><p>2.4</p></td><td><p>-11.6</p></td></tr><tr><td><p><b>11</b></p></td><td><p>Applebot</p></td><td><p>1.8</p></td><td><p>1.5</p></td><td><p>-0.3</p></td></tr><tr><td><p><b>12</b></p></td><td><p>ChatGPT-User</p></td><td><p>0.1</p></td><td><p>0.9</p></td><td><p>0.9</p></td></tr><tr><td><p><b>13</b></p></td><td><p>OAI-SearchBot</p></td><td><p>0</p></td><td><p>0.9</p></td><td><p>0.9</p></td></tr><tr><td><p><b>14</b></p></td><td><p>Baiduspider</p></td><td><p>0.5</p></td><td><p>0.5</p></td><td><p>0</p></td></tr><tr><td><p><b>15</b></p></td><td><p>Googlebot-Mobile</p></td><td><p>0.2</p></td><td><p>0.4</p></td><td><p>0.2</p></td></tr></table>
    <div>
      <h2>AI-only crawlers: OpenAI rises, ByteDance falls</h2>
      <a href="#ai-only-crawlers-openai-rises-bytedance-falls">
        
      </a>
    </div>
    <p>Looking only at AI bot traffic (as tracked on our <a href="https://radar.cloudflare.com/explorer?dataSet=ai.bots&amp;groupBy=user_agent&amp;dt=2025-07-01_2025-07-31&amp;timeCompare=2024-07-01"><u>Radar AI page</u></a>), the trend is clear. Since January 2025, GPTBot has steadily increased its crawling volume, driven mainly by training-related activity. ClaudeBot crawling accelerated in June, while Amazonbot and Bytespider activity slowed.</p><p>The <a href="https://radar.cloudflare.com/explorer?dataSet=ai.bots&amp;groupBy=user_agent&amp;dt=2025-07-01_2025-07-31&amp;timeCompare=2024-07-01"><u>chart</u></a> below shows how GPTBot surged over the past 12 months, overtaking Amazonbot and Bytespider, which both fell sharply:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5XRamYFTPqDrQ0bMQSG4C7/e741692f7019a4842b5d82bf4ab64106/4.png" />
          </figure><p>A comparison between July 2024 and July 2025 makes the shift even more obvious. GPTBot gained 16 percentage points, Meta’s crawler rose by more than 15, and ClaudeBot grew by 8. On the shrinking side, Amazonbot dropped 12 percentage points and Bytespider dropped over 31 percentage points.</p><table><tr><td><p>
</p></td><td><p><b>AI-only bots</b></p></td><td><p>July 2024 %</p></td><td><p>July 2025 %</p></td><td><p>Δ percentage-point change</p></td></tr><tr><td><p>1</p></td><td><p>GPTBot</p></td><td><p>11.9</p></td><td><p>28.1</p></td><td><p>16.1</p></td></tr><tr><td><p>2</p></td><td><p>ClaudeBot</p></td><td><p>15</p></td><td><p>23.3</p></td><td><p>8.3</p></td></tr><tr><td><p>3</p></td><td><p>Meta-ExternalAgent</p></td><td><p>2.4</p></td><td><p>17.7</p></td><td><p>15.3</p></td></tr><tr><td><p>4</p></td><td><p>Amazonbot</p></td><td><p>26.4</p></td><td><p>14.1</p></td><td><p>-12.3</p></td></tr><tr><td><p>5</p></td><td><p>Bytespider</p></td><td><p>37.3</p></td><td><p>5.8</p></td><td><p>-31.5</p></td></tr><tr><td><p>6</p></td><td><p>Applebot</p></td><td><p>4.9</p></td><td><p>3.7</p></td><td><p>-1.2</p></td></tr><tr><td><p>7</p></td><td><p>ChatGPT-User</p></td><td><p>0.2</p></td><td><p>2.4</p></td><td><p>2.2</p></td></tr><tr><td><p>8</p></td><td><p>OAI-SearchBot</p></td><td><p>0</p></td><td><p>2.2</p></td><td><p>2.2</p></td></tr><tr><td><p>9</p></td><td><p>TikTokSpider</p></td><td><p>0</p></td><td><p>0.7</p></td><td><p>0.7</p></td></tr><tr><td><p>10</p></td><td><p>imgproxy</p></td><td><p>0</p></td><td><p>0.7</p></td><td><p>0.7</p></td></tr><tr><td><p>11</p></td><td><p>PerplexityBot</p></td><td><p>0</p></td><td><p>0.4</p></td><td><p>0.4</p></td></tr><tr><td><p>12</p></td><td><p>Google-CloudVertexBot</p></td><td><p>0</p></td><td><p>0.3</p></td><td><p>0.3</p></td></tr><tr><td><p>13</p></td><td><p>AI2Bot</p></td><td><p>0</p></td><td><p>0.2</p></td><td><p>0.2</p></td></tr><tr><td><p>14</p></td><td><p>Timpibot</p></td><td><p>0.6</p></td><td><p>0.1</p></td><td><p>-0.5</p></td></tr><tr><td><p>15</p></td><td><p>CCBot</p></td><td><p>0.1</p></td><td><p>0.1</p></td><td><p>0</p></td></tr></table>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/71p4CgiUXwYrb9LIsJCruI/44dd4b232a715b852417853e7026fbcb/5.png" />
          </figure><p>We covered the functionality of these bots in our <a href="https://blog.cloudflare.com/from-googlebot-to-gptbot-whos-crawling-your-site-in-2025/#ai-only-crawlers-perspective"><u>June blog post</u></a>.</p>
    <div>
      <h2>Crawling by purpose: training dominates</h2>
      <a href="#crawling-by-purpose-training-dominates">
        
      </a>
    </div>
    <p>Training is the clear leader.<i> (We classify purpose based on operator disclosures and industry sources, a method we explained in this </i><a href="http://blog.cloudflare.com/ai-crawler-traffic-by-purpose-and-industry"><i><u>AI Week blog</u></i></a><i>.)</i> Over the past 12 months, 80% of AI crawling was for training, compared with 18% for search and just 2% for user actions. In the last six months, the share for training rose further to 82%, while search dropped to 15% and user actions increased slightly to 3%.</p><p>The <a href="https://radar.cloudflare.com/ai-insights#crawl-purpose"><u>chart</u></a> below shows how training-related crawling steadily grew over the past year, far outpacing other purposes:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/10lBzdfhgLKiWrEAIcs691/8b11d8d733c48938a7235dc07f65a83a/6.png" />
          </figure><p>The year-over-year comparison reinforces this trend. In July 2024, training accounted for 72% of AI crawling. By July 2025, it had risen to 79%. Over the same period, search fell from 26% to 17%, while user actions grew modestly from 2% to 3.2%.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2OcV2pA5nOBpOrl8pKPotL/4901f128d5feaba82357972509ba09f2/7.png" />
          </figure>
    <div>
      <h2>Crawl-to-refer ratios shifts: tens of thousands of bot crawls per human click</h2>
      <a href="#crawl-to-refer-ratios-shifts-tens-of-thousands-of-bot-crawls-per-human-click">
        
      </a>
    </div>
    <p>The crawl-to-refer ratio measures how many pages a platform crawls compared with how often it drives users to a website. In practice, a high ratio means heavy crawling but little referral traffic. For example, for every visitor Anthropic refers back to a website, its crawlers have already visited tens of thousands of pages.</p><p>Why does this metric matter? It highlights the imbalance between how much content AI systems consume and how little traffic they return. For publishers, it can feel like giving away the raw material for free. With that in mind, here’s how different platforms compare from January to July 2025.</p><p>Anthropic remains the most crawl-heavy platform. Even after an 87% decline this year, it still crawled 38,000 pages for every referred page visit in July 2025 — the highest imbalance among major AI players. Referrals may be improving, though, after Anthropic added <a href="https://www.anthropic.com/news/web-search"><u>web search to Claude in March 2025</u></a> (initially for U.S. paid users) and expanded it globally by <a href="https://www.brightedge.com/claude-search"><u>May to all users, including the free tier</u></a>. The feature introduced direct citations with clickable URLs, creating new referral pathways.</p><p>The full dataset is below, showing January–July 2025 ratios by platform ordered by the highest ratio average:
(Note: a rising ratio means <i>more</i> bot crawling per human click sent back, while a falling ratio means <i>less</i> bot crawling per human click sent back)

<b>Crawl-to-refer ratio (from </b><a href="https://radar.cloudflare.com/ai-insights?dateStart=2025-07-01&amp;dateEnd=2025-07-31#crawl-to-refer-ratio"><b><u>Cloudflare Radar’s data</u></b></a><b>)</b></p><table><tr><td><p><b>Service</b></p></td><td><p><b>Jan</b></p></td><td><p><b>Feb</b></p></td><td><p><b>Mar</b></p></td><td><p><b>Apr</b></p></td><td><p><b>May</b></p></td><td><p><b>Jun</b></p></td><td><p><b>Jul</b></p></td><td><p><b>Average</b></p></td><td><p><b>% Change Jan-Jul</b></p></td></tr><tr><td><p><b>Anthropic</b></p></td><td><p>286,930.1</p></td><td><p>271,748.2</p></td><td><p>121,612.7</p></td><td><p>130,330.2</p></td><td><p>114,313</p></td><td><p>71,282.8</p></td><td><p>38,065.7</p></td><td><p>147,754.7</p></td><td><p>-86.7%</p></td></tr><tr><td><p><b>OpenAI</b></p></td><td><p>1,217.4</p></td><td><p>1,774.5</p></td><td><p>2,217</p></td><td><p>1200</p></td><td><p>995.6</p></td><td><p>1,655.9</p></td><td><p>1,091.4</p></td><td><p>1,437.8</p></td><td><p>-10.4%</p></td></tr><tr><td><p><b>Perplexity</b></p></td><td><p>54.6</p></td><td><p>55.3</p></td><td><p>201.3</p></td><td><p>300.9</p></td><td><p>199.1</p></td><td><p>200.6</p></td><td><p>194.8</p></td><td><p>172.4</p></td><td><p>256.7%</p></td></tr><tr><td><p><b>Microsoft</b></p></td><td><p>38.5</p></td><td><p>44.2</p></td><td><p>42.3</p></td><td><p>43.3</p></td><td><p>45.1</p></td><td><p>42</p></td><td><p>40.7</p></td><td><p>42.3</p></td><td><p>5.7%</p></td></tr><tr><td><p><b>Yandex</b></p></td><td><p>15.5</p></td><td><p>13.1</p></td><td><p>13.1</p></td><td><p>15.7</p></td><td><p>14.7</p></td><td><p>15.9</p></td><td><p>21.4</p></td><td><p>15.6</p></td><td><p>38.3%</p></td></tr><tr><td><p><b>Google</b></p></td><td><p>3.8</p></td><td><p>6.3</p></td><td><p>14.6</p></td><td><p>22.5</p></td><td><p>16.7</p></td><td><p>13.1</p></td><td><p>5.4</p></td><td><p>11.8</p></td><td><p>43%</p></td></tr><tr><td><p><b>ByteDance</b></p></td><td><p>18</p></td><td><p>16.4</p></td><td><p>3.5</p></td><td><p>2.3</p></td><td><p>1.6</p></td><td><p>1.6</p></td><td><p>0.9</p></td><td><p>6.3</p></td><td><p>-95%</p></td></tr><tr><td><p><b>Baidu</b></p></td><td><p>0.6</p></td><td><p>0.7</p></td><td><p>0.8</p></td><td><p>1.5</p></td><td><p>1.2</p></td><td><p>1</p></td><td><p>0.9</p></td><td><p>1</p></td><td><p>44.5%</p></td></tr><tr><td><p><b>DuckDuckGo</b></p></td><td><p>0.1</p></td><td><p>0.2</p></td><td><p>0.2</p></td><td><p>0.2</p></td><td><p>0.3</p></td><td><p>0.3</p></td><td><p>0.3</p></td><td><p>0.2</p></td><td><p>116.3%</p></td></tr></table><p>Looking at the changes from January to July 2025:</p><ul><li><p><b>Anthropic</b> recorded the steepest decrease in bot to human traffic, down <b>86.7%</b>. From 286,930 bots per human in January, to 38,065 bots per human in July, the change shows a dramatic increase in referrals. Despite the change, it remains by far the most crawl-heavy platform, with tens of thousands of pages still crawled for every referral.</p></li><li><p><b>Perplexity</b> moved in the opposite direction, with bot crawling increasing <b>+256.7%</b> relative to human visitors; climbing from <b>54 bots per human</b> in January to <b>195 bots per human</b> in July. While the ratio is still far below Anthropic, the increase shows it is crawling more heavily, relative to the traffic it refers, than it did earlier.</p></li><li><p><b>OpenAI</b> ratio dropped slightly, from 1,217 bots per human in January to 1,091 in July (-10%). The shift is smaller than Anthropic’s but suggests OpenAI is sending a bit more referral traffic relative to its crawling.</p></li><li><p><b>Microsoft</b> stayed steady, with its ratio moving only slightly, from 38.5 bots per human in January to 40.7 in July (+6%). This consistency suggests stable behavior from Bing-linked services.</p></li><li><p><b>Yandex</b> increased from 15.5 bots per human in January to 21.4 in July (+38%). The overall ratio is far smaller than Anthropic’s or Perplexity’s, but it shows Yandex is crawling more heavily relative to the traffic it sends back.</p></li></ul><p>Alongside measuring crawling volumes and referral traffic (now also visible on the<a href="https://radar.cloudflare.com/ai-insights#ai-bot-best-practices"><u> AI Insights page of Cloudflare Radar</u></a>), it’s worth looking at whether AI operators follow good practices when deploying their bots. Cloudflare data shows that most leading AI crawlers are on our <a href="https://radar.cloudflare.com/bots#verified-bots"><u>verified bots</u></a> list, meaning their IP addresses match published ranges and they respect robots.txt. But adoption of newer standards like <a href="https://developers.cloudflare.com/bots/concepts/bot/verified-bots/web-bot-auth/"><u>WebBotAuth</u></a> — which uses cryptographic signatures in HTTP messages to confirm a request comes from a specific bot, and is especially relevant today — is still missing. </p><p>Meta, OpenAI, and Anthropic run distinct bots for different purposes, while Google and Microsoft rely on unified crawlers. Anthropic, however, still lags in verification, which makes it easier for bad actors to spoof its crawler and ignore robots.txt. Without verification, it’s difficult to distinguish real from fake traffic — leaving its compliance effectively unclear. (A longer list of AI bots is available <a href="https://radar.cloudflare.com/ai-insights#ai-bot-best-practices"><u>here</u></a>).</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4EvNGFKp6pGQUP84P33qJG/b646c0aad05d68d3f9c4a37d08bd483f/8.png" />
          </figure>
    <div>
      <h2>Conclusion and what’s next</h2>
      <a href="#conclusion-and-whats-next">
        
      </a>
    </div>
    <p>If training-related crawling continues to dominate while referrals stay flat, creators face a paradox: feeding AI systems without gaining traffic in return. Many want their content to appear in chatbot answers, but without monetization or cooperation, the incentive to produce quality work declines.</p><p>The Web now stands at a fork in the road. Either a new balance emerges — one where the new AI era helps sustain publishers and creators — or AI turns the open web into a one-way training set, extracting value with little flowing back.</p><p>You can learn more about some of these data trends on Cloudflare Radar’s updated<a href="https://radar.cloudflare.com/ai-insights"><u> AI Insights page</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Radar]]></category>
            <category><![CDATA[Internet Trends]]></category>
            <category><![CDATA[Traffic]]></category>
            <category><![CDATA[Bots]]></category>
            <guid isPermaLink="false">71UVAVb7ICHgxWp6yhCLoA</guid>
            <dc:creator>João Tomé</dc:creator>
        </item>
        <item>
            <title><![CDATA[A deeper look at AI crawlers: breaking down traffic by purpose and industry]]></title>
            <link>https://blog.cloudflare.com/ai-crawler-traffic-by-purpose-and-industry/</link>
            <pubDate>Thu, 28 Aug 2025 14:05:00 GMT</pubDate>
            <description><![CDATA[ We are extending AI-related insights on Cloudflare Radar with new industry-focused data and a breakdown of bot traffic by purpose, such as training or user action.  ]]></description>
            <content:encoded><![CDATA[ <p>Search platforms historically crawled web sites with the implicit promise that, as the sites showed up in the results for relevant searches, they would send traffic on to those sites — in turn leading to ad revenue for the publisher. This model worked fairly well for several decades, with a whole industry emerging around optimizing content for optimal placement in search results. It led to higher click-through rates, more eyeballs for publishers, and, ideally, more ad revenue. However, the emergence of AI platforms over the last several years, and the incorporation of AI "overviews" into classic search platforms, has turned the model on its head. When users turn to these AI platforms with queries that used to go to search engines, they often won't click through to the original source site once an answer is provided — and that assumes that a link to the source is provided at all! No clickthrough, no eyeballs, and no ad revenue. </p><p>To provide a perspective on the scope of this problem, Radar <a href="https://blog.cloudflare.com/ai-search-crawl-refer-ratio-on-radar/"><u>launched</u></a> <a href="https://radar.cloudflare.com/ai-insights#crawl-to-refer-ratio"><u>crawl/refer ratios</u></a> on July 1, based on traffic seen across our whole customer base. These ratios effectively compare the number of crawling requests for HTML pages from the <a href="https://www.cloudflare.com/learning/bots/what-is-a-web-crawler/"><u>crawler</u></a> associated with a given platform, to the number of HTML page requests referred by that platform (measuring human traffic). This data complements insights into <a href="https://radar.cloudflare.com/ai-insights#ai-bot-crawler-traffic"><u>AI bot &amp; crawler traffic trends</u></a> that were <a href="https://blog.cloudflare.com/bringing-ai-to-cloudflare/#ai-bot-traffic-insights-on-cloudflare-radar"><u>launched</u></a> during Birthday Week 2024.</p><p>Today, we're adding two new capabilities to the <a href="https://radar.cloudflare.com/ai-insights"><b><u>AI Insights</u></b></a> page on Cloudflare Radar to give you more insight into this activity: industry-focused AI bot traffic data, and a new breakdown of AI bot traffic by its purpose.</p>
    <div>
      <h2>Traffic by type</h2>
      <a href="#traffic-by-type">
        
      </a>
    </div>
    <p>Since the launch of <a href="https://www.cloudflare.com/learning/ai/what-is-large-language-model/"><u>LLMs</u></a> into the public consciousness in November 2022, much of the crawling traffic seen from user agents associated with AI platforms has been to collect content used to train AI models. This crawling activity can be aggressive at times, often ignoring <a href="https://radar.cloudflare.com/ai-insights#ai-user-agents-found-in-robotstxt"><u>directives found in robots.txt files</u></a>. In addition to offering chatbots trained on this <a href="https://www.cloudflare.com/learning/bots/what-is-content-scraping/"><u>scraped content</u></a>, AI platforms have emerged that aim to replace classic search tools, while those tools have themselves integrated AI-powered summaries as part of their results. These platforms may crawl your site to build indexes for their search engines. And some AI platforms may crawl your site in response to a specific user prompt, such as looking for flights to plan a vacation.</p><p>The new <b>Crawl purpose</b> selector within the <b>AI bot &amp; crawler traffic</b> card allows users to select between <b>Training</b>, <b>Search</b>, <b>User action</b>, and <b>Undeclared</b>. (The latter is for crawlers where no information is available from the operator or other industry sources regarding its purpose.) </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4bIoxF54OFCmecoOWOHDQ3/8e252d3ffbb4f948a76158661a4b013a/1_-_crawlpurpose-dropdown.png" />
          </figure><p>Once a purpose is selected, the <a href="https://radar.cloudflare.com/ai-insights#http-traffic-by-bot"><b><u>HTTP traffic by bot</u></b></a> graph updates to show traffic trends over the selected time period for the top five most active AI bots that crawl for the selected purpose.</p><p>As an example, selecting <b>User action</b> results in a <a href="https://radar.cloudflare.com/ai-insights?dateStart=2025-07-01&amp;dateEnd=2025-07-28#http-traffic-by-bot"><u>graph</u></a> like the one below, which covers the first 28 days of July 2025. OpenAI’s <i>ChatGPT-User</i> bot is responsible for nearly three quarters of the request traffic from this cohort of crawlers. A daily cycle is clearly evident, suggesting regular usage of ChatGPT in that fashion, with such usage gradually increasing throughout the month. If <i>ChatGPT-User </i>is removed from the chart, <i>Perplexity-User</i> also exhibits a similar pattern.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/Vt5HUwATxJgWezhbpyA0N/f1b2745802ba4c1b7ee33b3c77b6ed4d/2_-_http_traffic_-_user_action.png" />
          </figure><p>A new <a href="https://radar.cloudflare.com/ai-insights#crawl-purpose"><b><u>Crawl purpose</u></b></a> graph has also been added to Radar, breaking out traffic trends by purpose. <i>Training</i> traffic, responsible for nearly 80% of the crawling from AI bots, is somewhat erratic in nature, with no clear cyclical pattern. However, such patterns are visible for the <i>User action</i> and <i>Undeclared</i> purposes, as shown in the <a href="https://radar.cloudflare.com/ai-insights?dateStart=2025-07-01&amp;dateEnd=2025-07-28#crawl-purpose"><u>graph</u></a> below, although they account for less than 5% of AI bot traffic across this time period.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2jis2lHk6KjbWpOQPcARmy/7ae33385be2ac1d820104a2dc22f489a/3_-_crawlpurpose-graph.png" />
          </figure><p>Within the <a href="https://radar.cloudflare.com/explorer?dataSet=ai.bots"><u>Data Explorer</u></a> view for the <b>AI Bots &amp; Crawlers</b> dataset, you can now <a href="https://radar.cloudflare.com/explorer?dataSet=ai.bots&amp;dt=28d&amp;groupBy=crawl_purpose"><u>break the data down by </u><b><u>Crawl purpose</u></b></a> to explore how the activity has changed over time. Alternatively, you can <a href="https://radar.cloudflare.com/explorer?dataSet=ai.bots&amp;dt=28d&amp;groupBy=user_agent&amp;filters=crawlPurpose%253DTraining"><u>break the data down by </u><b><u>User agent</u></b><u>, and filter by </u><b><u>Crawl purpose</u></b></a>, to explore traffic trends across a larger set of bots (beyond the top five). <a href="https://radar.cloudflare.com/explorer?dataSet=ai.bots&amp;dt=28d&amp;groupBy=user_agent&amp;filters=crawlPurpose%253DTraining&amp;timeCompare=1"><u>Comparisons with previous time periods</u></a> are available here as well.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6kCgMWSeVGYdQ9jnkAOMhe/ab71e21d0b620b78b72aaf90f7ecbb46/4_-_dataexplorer_-_training.png" />
          </figure>
    <div>
      <h2>Visibility by industry</h2>
      <a href="#visibility-by-industry">
        
      </a>
    </div>
    <p>You can use your own traffic data to see how aggressively crawlers <a href="https://www.cloudflare.com/learning/ai/how-to-prevent-web-scraping/">scrape</a> your content. You can also see how frequently they refer traffic back to you. However, you may also want to understand how those measurements compare with your peer group — are you being crawled more or less frequently, and are the platforms referring more or less traffic back to your sites? The new industry set filtering available for the <a href="https://radar.cloudflare.com/ai-insights#http-traffic-by-bot"><b><u>HTTP traffic by bot</u></b><u> graph</u></a> and the <a href="https://radar.cloudflare.com/ai-insights#crawl-to-refer-ratio"><b><u>Crawl-to-refer ratio</u></b><u> table</u></a> in the <a href="https://radar.cloudflare.com/ai-insights"><b><u>AI Insights</u></b></a> section of Radar can provide you with this perspective.</p><p>Within the <b>AI bot &amp; crawler traffic</b> card on the AI Insights page, select an industry set from the drop-down list at the top right of the card. The graphs in the <b>HTTP traffic by bot</b> and <b>Crawl purpose</b> sections of the card update to reflect the selection, as does the <b>Crawl-to-refer ratio</b> table. (Selecting a <b>Crawl purpose</b> from that drop-down menu will further update the <b>HTTP traffic by bot</b> graph.)</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6NBLZ4KnJ2A75L92a3bVK4/1665549e5761b0ae449d651a49ba7e64/5_-_industry_set_-_dropdown.png" />
          </figure><p>It is interesting to observe how the crawling patterns change between industry sets, along with the mix of most active bots and crawl-to-refer ratios. For example, across the first week of August, with <a href="https://radar.cloudflare.com/ai-insights?dateStart=2025-08-01&amp;dateEnd=2025-08-07#http-traffic-by-bot"><u>no vertical or crawl purpose selected</u></a>, <b>ClaudeBot</b> and <b>GPTBot</b> account for nearly half of the observed crawling activity, with <b>Meta-ExternalAgent</b> the only one among the top five exhibiting activity that remotely resembles a pattern. For the default view, Anthropic had the highest crawl-to-refer ratio at nearly 50,000:1, followed by OpenAI at 887:1 and Perplexity at 118:1.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2StNvYYHAK9PZ6U0tGvwiH/68266c10a50ef70507a645a5dfcc2059/6_-_http_traffic_-_no_vertical.png" />
          </figure><p>However, when the <a href="https://radar.cloudflare.com/ai-insights?industrySet=News+%26+Publications&amp;dateStart=2025-08-01&amp;dateEnd=2025-08-07"><b><u>News and Publications industry set is selected</u></b></a>, we see<b> </b>a much tighter distribution of traffic among the top five, ranging from <b>ChatGPT-User</b>’s 14.9% share of traffic to <b>GPTBot</b>’s 17.4% share. <b>ChatGPT-User</b>’s presence among the top five suggests that a significant number of users may have been asking questions about current events during that period of time. For these <b>News and Publications</b> sites, the crawl-to-refer ratios are lower than the default view, with Anthropic at 2,500:1, OpenAI at 152:1, and Perplexity at 32.7:1. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4EpH7k6tQKSMTdXIQtoG1y/7ad2383f442e390760d0eb2a3d3b7127/7_-_industry_set_-_news___publications.png" />
          </figure><p>As a third example, we find that the mix again shifts for the <a href="https://radar.cloudflare.com/ai-insights?industrySet=Computer+%26+Electronics&amp;dateStart=2025-08-01&amp;dateEnd=2025-08-07#http-traffic-by-bot"><b><u>Computer and Electronics industry set</u></b></a>. While <b>GPTBot</b> was again the most active AI bot, <b>Amazonbot</b> moved up into second place; together these bots now account for over 40% of crawling traffic. <b>ClaudeBot</b> and <b>Meta-ExternalAgent</b> both had a 13.9% share of the crawling traffic, with ByteDance’s <b>ByteSpider</b> rounding out the top five. The crawl-to-refer ratios for this vertical are again lower than for the unfiltered view, with Anthropic down to 8,800:1, OpenAI at 401.7:1, and Perplexity at 88:1.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5KjHMu0t6uCJAHEjgzDiNz/31267af8484006c6be1b834107cb3052/8_-_industry_set_-_computer___electronics.png" />
          </figure><p>Within Data Explorer, you can now break down <b>AI Bots &amp; Crawler</b> data by Vertical and Industry. (A vertical is a pre-defined collection of multiple related industries), and you can also filter <b>Crawl purpose</b> and <b>User agent</b> breakdowns by Vertical and Industry. For example, the graphs below illustrate the <a href="https://radar.cloudflare.com/explorer?dataSet=ai.bots&amp;groupBy=user_agent&amp;dt=2025-08-01_2025-08-07&amp;filters=vertical%253DFinance%252Cindustry%253DCryptocurrency#result"><u>traffic trends by AI crawler</u></a> for sites within the <b>Cryptocurrency</b> industry under the <b>Finance</b> vertical, as well as the <a href="https://radar.cloudflare.com/explorer?dataSet=ai.bots&amp;groupBy=crawl_purpose&amp;dt=2025-08-01_2025-08-07&amp;filters=vertical%253DFinance%252Cindustry%253DCryptocurrency#result"><u>traffic trends by crawl purpose</u></a> for that industry/vertical pair. While these sites see crawling traffic from quite a few bots, three-quarters of that traffic during the first week of August was concentrated in just four bots, and 80% of it was for gathering information to train models.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/39MVSCz4a41eKDqIR0Dj4Z/5489805b938051212ca0374e892ef756/9_-_dataexplorer_-_http_traffic_-_finance_cryptocurrency.png" />
          </figure>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7ppfZea6L4fdZ4RKWIVNq5/a605a2f3b45bb6ef540ca57d78bb145e/10_-_dataexplorer_-_crawl_purpose_-_finance_cryptocurrency.png" />
          </figure><p>Because the Industry sets shown on the main <b>AI Insights</b> page are manually curated collections of related industries, clicking through to the Data Explorer view from one of those graphs will pre-populate the Industry selector with the relevant entries. For example, clicking through from the <a href="https://radar.cloudflare.com/ai-insights?industrySet=Gaming+%26+Gambling#http-traffic-by-bot"><b><u>HTTP traffic by bot</u></b><u> graph for the </u><b><u>Gaming &amp; Gambling</u></b><u> industry set</u></a> results in the following <a href="https://radar.cloudflare.com/explorer?dataSet=ai.bots&amp;groupBy=user_agent&amp;filters=industry%253DComputer%25252520Games%25252CGambling%25252520%25252526%25252520Casinos%25252CGambling%25252520and%25252520Casinos%2525253B%25252520Recreation%25252CGaming&amp;dt=2025-08-01_2025-08-07"><u>Data Explorer view</u></a>, which lists the component industries.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/60FepjNCd25CFKWTQzdVsq/2772c2782c93772f4a55364f06846bd5/11_-_dataexplorer_-_gaming_gambling_industries.png" />
          </figure>
    <div>
      <h2>Conclusion</h2>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>AI crawler traffic has become a fact of life for content owners, and the complexity of dealing with it has increased as bots are used for purposes beyond LLM training. <a href="https://contentsignals.org/"><u>Work is underway</u></a> to allow website publishers to declare how automated systems should use their content. However, it will take some time for these proposed solutions to be standardized, and for both publishers and crawlers to adopt them. As the space evolves, we’ll continue to expand Cloudflare Radar’s insights into AI crawler activity.</p><p>If you share our AI-related graphs on social media, be sure to tag us: <a href="https://x.com/CloudflareRadar"><u>@CloudflareRadar</u></a> (X), <a href="https://noc.social/@cloudflareradar"><u>noc.social/@cloudflareradar</u></a> (Mastodon), and <a href="https://bsky.app/profile/radar.cloudflare.com"><u>radar.cloudflare.com</u></a> (Bluesky). If you have questions or comments, you can reach out to us on social media, or contact us via <a><u>email</u></a>.</p><div>
  
</div><p></p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Radar]]></category>
            <category><![CDATA[Traffic]]></category>
            <category><![CDATA[Bots]]></category>
            <guid isPermaLink="false">6PuiWWmAnS4oHYFYoYysBU</guid>
            <dc:creator>David Belson</dc:creator>
        </item>
        <item>
            <title><![CDATA[The age of agents: cryptographically recognizing agent traffic]]></title>
            <link>https://blog.cloudflare.com/signed-agents/</link>
            <pubDate>Thu, 28 Aug 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare now lets websites and bot creators use Web Bot Auth to segment agents from verified bots, making it easier for customers to allow or disallow the many types of user and partner directed. ]]></description>
            <content:encoded><![CDATA[ <p>On the surface, the goal of handling bot traffic is clear: keep malicious bots away, while letting through the helpful ones. Some bots are evidently malicious — such as mass price scrapers or those testing stolen credit cards. Others are helpful, like the bots that index your website. Cloudflare has segmented this second category of helpful bot traffic through our <a href="https://developers.cloudflare.com/bots/concepts/bot/#verified-bots"><u>verified bots</u></a> program, <a href="https://developers.cloudflare.com/bots/concepts/bot/verified-bots/policy/"><u>vetting</u></a> and validating bots that are transparent about who they are and what they do.</p><p>Today, the rise of <a href="https://agents.cloudflare.com/"><u>agents</u></a> has transformed how we interact with the Internet, often blurring the distinctions between benign and malicious bot actors. Bots are no longer directed only by the bot owners, but also by individual end users to act on their behalf. These bots directed by end users are often working in ways that website owners want to allow, such as planning a trip, ordering food, or making a purchase.</p><p>Our customers have asked us for easier, more granular ways to ensure specific <a href="https://www.cloudflare.com/learning/bots/what-is-a-bot/"><u>bots</u></a>, <a href="https://www.cloudflare.com/learning/bots/what-is-a-web-crawler/"><u>crawlers</u></a>, and <a href="https://www.cloudflare.com/learning/ai/what-is-agentic-ai/"><u>agents</u></a> can reach their websites, while continuing to block bad actors. That’s why we’re excited to introduce <b>signed agents</b>, an extension of our verified bots program that gives a new bot classification in our security rules and in Radar. Cloudflare has long recognized agents — but we’re now endowing them with their own classification to make it even easier for our customers to set the traffic lanes they want for their website. </p>
    <div>
      <h2>The age of agents</h2>
      <a href="#the-age-of-agents">
        
      </a>
    </div>
    <p>Cloudflare has continuously expanded our verified bot categorization to include different functions as the market has evolved. For instance, we first announced our grouping of <a href="https://blog.cloudflare.com/ai-bots/"><u>AI crawler traffic as an official bot category</u></a> in 2023. And in 2024, when OpenAI announced a <a href="https://openai.com/index/searchgpt-prototype/"><u>new AI search prototype</u></a> and introduced <a href="https://platform.openai.com/docs/bots"><u>three different bots</u></a> with distinct purposes, we <a href="https://blog.cloudflare.com/cloudflare-ai-audit-control-ai-content-crawlers/"><u>added three new categories</u></a> to account for this innovation: AI Search, AI Assistant, and Archiver.</p><p>But the bot landscape is constantly evolving. Let's unpack a common type of verified AI bot — an AI crawler such as <a href="https://radar.cloudflare.com/bots/directory/gptbot"><u>GPTBot</u></a>. Even though the bot performs an array of tasks, the bot’s ultimate purpose is a singular, repetitive task on behalf of the operator of that bot: fetch and index information. Its intelligence is applied to performing that singular job on behalf of that bot owner. </p><p>Agents, though, are different. Think about an AI agent tasked by a user to "Book the best deal for a round-trip flight to New York City next month." These agents sometimes use remote browsing products like Cloudflare's <a href="https://developers.cloudflare.com/browser-rendering/"><u>Browser Rendering</u></a> and similar products from companies like Browserbase and Anchor Browser. And here is the key distinction: this particular type of bot isn’t operating on behalf of a single company, like OpenAI in the prior example, but rather the end users themselves. </p>
    <div>
      <h2>Introducing signed agents</h2>
      <a href="#introducing-signed-agents">
        
      </a>
    </div>
    <p>In May, we announced Web Bot Auth, a new method of <a href="https://blog.cloudflare.com/web-bot-auth/"><u>using cryptography to verify bot and agent traffic</u></a>. HTTP message signatures allow bots to authenticate themselves and allow customer origins to identify them. This is one of the authentication methods we use today for our verified bots program. </p><p>What, exactly, is a <a href="https://developers.cloudflare.com/bots/concepts/bot/signed-agents/"><u>signed agent</u></a>? First, they are agents that are generally directed by an end user instead of a single company or entity. Second, the infrastructure or remote browsing platform the agents use is signing their HTTP requests via Web Both Auth, with Cloudflare validating these message signatures. And last, they comply with our <a href="https://developers.cloudflare.com/bots/concepts/bot/signed-agents/policy/"><u>signed agent policy</u></a>.</p><p>The signed agents classification improves on our existing frameworks in a couple of ways:</p><ol><li><p><b>Increased precision and visibility:</b> we’ve updated the <i>Cloudflare bots and agents directory to include signed agents</i> in addition to verified bots. This allows us to verify the cryptographic signatures of a much wider set of automated traffic, and our customers to granularly apply their security preferences more easily. Bot operators can now <i>submit signed agent applications from the Cloudflare dashboard</i>, allowing bot owners to specify to us how they think we should segment their automated traffic. </p></li><li><p><b>Easier controls from security rules</b>: similar to how they can take action on verified bots as a group, our Enterprise customers will be able to take action on <i>signed agents as a group when configuring their security rules</i>. This new field will be available in the Cloudflare dashboard under security rules soon.</p></li></ol><p>To apply to have an agent added to Cloudflare’s directory of bots and agents, customers should complete the <a href="https://dash.cloudflare.com?to=/:account/configurations/bot-submission-form"><u>Bot Submission Form</u></a> in the Cloudflare dashboard. Here, they can specify whether the submission should be considered for the signed agents list or the verified bots list. All signed agents will be recognized by their cryptographic signatures through <a href="https://datatracker.ietf.org/doc/html/draft-meunier-web-bot-auth-architecture"><u>Web Bot Auth validation</u></a>. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5caeGdhlmI3dO3GNZKeEUg/0dac239a94732404861b3876f6bdb8b6/BLOG-2930_2.png" />
          </figure><p><sub>The Bot Submission Form, available in the Cloudflare dashboard for bot owners to submit both verified bot and signed agent applications.</sub></p><p>We want to be clear: our verified bots program isn’t going anywhere. In fact, well-behaved and transparent applications that make use of signed agents can further qualify to be a verified bot, if their specific service adheres to our <a href="https://developers.cloudflare.com/bots/concepts/bot/verified-bots/policy/"><u>policy</u></a>. For instance,<a href="https://radar.cloudflare.com/scan"> <u>Cloudflare Radar's URL Scanner</u></a>, which relies on Browser Rendering as a service to scan URLs, is a <a href="https://radar.cloudflare.com/bots/directory/cloudflare-radar-url-scanner"><u>verified bot</u></a>. While Browser Rendering itself does not qualify to be a verified bot, URL Scanner does, since the bot owner (in this case, Cloudflare Radar) directs the traffic sent by the bot and always identifies itself with a unique Web Bot Auth signature — distinct from <a href="https://developers.cloudflare.com/browser-rendering/reference/automatic-request-headers/"><u>Browser Rendering’s signature</u></a>. </p>
    <div>
      <h2>From an agent’s perspective… </h2>
      <a href="#from-an-agents-perspective">
        
      </a>
    </div>
    <p>Since the launch of Web Bot Auth, our own Browser Rendering product has been sending signed Web Bot Auth HTTP headers, and is always given a bot score of 1 for our Bot Management customers. As of today, Browser Rendering will now show up in this new signed agent category. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1F8Z0E6WqJTxLf9G3PLB3a/84e80539be402066fe02ab60c431100a/BLOG-2930_3.png" />
          </figure><p>We’re also excited to announce the first cohort of agents that we’re partnering with and will be classifying as signed agents: <a href="https://openai.com/index/introducing-chatgpt-agent/"><u>ChatGPT agent</u></a>, <a href="https://block.xyz/inside/block-open-source-introduces-codename-goose"><u>Goose</u></a> from Block, <a href="https://docs.browserbase.com/introduction/what-is-browserbase"><u>Browserbase</u></a>, and <a href="https://anchorbrowser.io/"><u>Anchor Browser</u></a>. They are perfect examples of this new classification because their remote browsers are used by their end customers, not necessarily the companies themselves. We’re thrilled to partner with these teams to take this critical step for the AI ecosystem:</p><blockquote><p>“<i>When we built Goose as an open source tool, we designed it to run locally with an extensible architecture that lets developers automate complex workflows. As Goose has evolved to interact with external services and third-party sites on users' behalf, Web Bot Auth enables those sites to trust Goose while preserving what makes it unique. </i><b><i>This authentication breakthrough unlocks entirely new possibilities for autonomous agents</i></b>." – <b>Douwe Osinga</b>, Staff Software Engineer, Block</p></blockquote><blockquote><p><i>"At Browserbase, we provide web browsing capabilities for some of the largest AI applications. We're excited to partner with Cloudflare to support the adoption of Web Bot Auth, a critical layer of identity for agents. </i><b><i>For AI to thrive, agents need reliable, responsible web access.</i></b><i>"</i>  – <b>Paul Klein</b>, CEO, Browserbase</p></blockquote><blockquote><p><i>“Anchor Browser has partnered with Cloudflare to let developers ship verified browser agents. This way </i><b><i>trustworthy bots get reliable access while sites stay protected</i></b><i>.”</i> – <b>Idan Raman</b>, CEO, Anchor Browser</p></blockquote>
    <div>
      <h2>Updated visibility on Radar</h2>
      <a href="#updated-visibility-on-radar">
        
      </a>
    </div>
    <p>We want everyone to be in the know about our bot classifications. Cloudflare began publishing verified bots on our Radar page <a href="https://radar.cloudflare.com/bots#verified-bots"><u>back in 2022</u></a>, meaning anyone on the Internet — Cloudflare customer or not — can see all of our <a href="https://radar.cloudflare.com/bots#verified-bots"><u>verified bots on Radar</u></a>. We dynamically update the list of bots, but show more than just a list: we announced on <a href="https://www.cloudflare.com/en-gb/press-releases/2025/cloudflare-just-changed-how-ai-crawlers-scrape-the-internet-at-large/"><u>Content Independence Day</u></a> that <a href="https://blog.cloudflare.com/ai-search-crawl-refer-ratio-on-radar/#one-more-thing"><u>every verified bot would get its own page</u></a> in our public-facing directory on Radar, which includes the traffic patterns that we see for each bot.</p><p>Our directory has been updated to include <a href="https://radar.cloudflare.com/bots/directory"><b><u>both signed agents and verified bots</u></b></a> — we share exactly how Cloudflare classifies the bots that it recognizes, plus we surface all of the traffic that Cloudflare observes from these many recognized agents and bots. Through this updated directory, we’re not only giving better visibility to our customers, but also striving to set a higher standard for transparency of bot traffic on the Internet. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/65QPFjmbBde3EzHTOwElSL/cccc8f23c37716c251e0c21850855265/BLOG-2930_4.png" />
          </figure><p><sub>Cloudflare Radar’s Bots Directory, which lists verified bots and signed agents. This view is filtered to view only agent entries.</sub></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2wBz7UwrQQzT7rJJnXiF8C/16eed3f1afd95cac32c4bcb647c6e5e6/BLOG-2930_5.png" />
          </figure><p><sub>Cloudflare Radar’s signed agent page for ChatGPT agent, which includes its traffic patterns for the last 7 days, from August 21, 2025 to August 27, 2025. </sub></p>
    <div>
      <h2>What’s now, what’s next</h2>
      <a href="#whats-now-whats-next">
        
      </a>
    </div>
    <p>As of today, the Cloudflare bot directory supports both bots and agents in a more clear-cut way, and customers or agent creators can submit agents to be signed and recognized <a href="https://dash.cloudflare.com/?to=/:account/configurations/bot-submission-form"><u>through their account dashboard</u></a>. In addition, anyone can see our signed agents and their traffic patterns on Radar. Soon, customers will be able to take action on signed agents as a group within their firewall rules, the same way you can take action on our verified bots. </p><p>Agents are changing the way that humans interact with the Internet. Websites need to know what tools are interacting with them, and for the builders of those tools to be able to easily scale. Message signatures help achieve both of these goals, but this is only step one. Cloudflare will continue to make it easier for agents and websites to interact (or not!) at scale, in a seamless way. </p><p>
</p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[AI Bots]]></category>
            <category><![CDATA[Bot Management]]></category>
            <category><![CDATA[Security]]></category>
            <guid isPermaLink="false">1LQFWI1jzZnWAqR4iFMLLi</guid>
            <dc:creator>Jin-Hee Lee</dc:creator>
        </item>
        <item>
            <title><![CDATA[The next step for content creators in working with AI bots: Introducing AI Crawl Control]]></title>
            <link>https://blog.cloudflare.com/introducing-ai-crawl-control/</link>
            <pubDate>Thu, 28 Aug 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare launches AI Crawl Control (formerly AI Audit) and introduces easily customizable 402 HTTP responses. ]]></description>
            <content:encoded><![CDATA[ <p><i>Empowering content creators in the age of AI with smarter crawling controls and direct communication channels</i></p><p>Imagine you run a regional news site. Last month an AI bot scraped 3 years of archives in minutes — with no payment and little to no referral traffic. As a small company, you may struggle to get the AI company's attention for a licensing deal. Do you block all crawler traffic, or do you let them in and settle for the few referrals they send? </p><p>It’s picking between two bad options.</p><p>Cloudflare wants to help break that stalemate. On July 1st of this year, we declared <a href="https://www.cloudflare.com/press-releases/2025/cloudflare-just-changed-how-ai-crawlers-scrape-the-internet-at-large/"><u>Content Independence Day</u></a> based on a simple premise: creators deserve control of how their content is accessed and used. Today, we're taking the next step in that journey by releasing AI Crawl Control to general availability — giving content creators and AI crawlers an important new way to communicate.</p>
    <div>
      <h2>AI Crawl Control goes GA</h2>
      <a href="#ai-crawl-control-goes-ga">
        
      </a>
    </div>
    <p>Today, we're rebranding our AI Audit tool as <b>AI Crawl Control</b> and moving it from beta to <b>general availability</b>. This reflects the tool's evolution from simple monitoring to detailed insights and <a href="https://www.cloudflare.com/learning/ai/how-to-block-ai-crawlers/">control over how AI systems can access your content</a>. </p><p>The market response has been overwhelming: content creators across industries needed real agency, not just visibility. AI Crawl Control delivers that control.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/pIAbmCR0tTK71umann3w0/e570c5f898e3d399babf6d1f82c2f3d8/image3.png" />
          </figure>
    <div>
      <h2>Using HTTP 402 to help publishers license content to AI crawlers</h2>
      <a href="#using-http-402-to-help-publishers-license-content-to-ai-crawlers">
        
      </a>
    </div>
    <p>Many content creators have faced a binary choice: either they block all AI crawlers and miss potential licensing opportunities and referral traffic; or allow them through without any compensation. Many content creators had no practical way to say "we're open for business, but let's talk terms first."</p><p>Our customers are telling us:</p><ul><li><p>We want to license our content, but crawlers don't know how to reach us. </p></li><li><p>Blanket blocking feels like we're closing doors on potential revenue and referral traffic. </p></li><li><p>We need a way to communicate our terms before crawling begins. </p></li></ul><p>To address these needs, we are making it easier than ever to send customizable<b> </b><a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/402">402 HTTP status codes</a>. </p><p>Our <a href="https://blog.cloudflare.com/introducing-pay-per-crawl/#what-if-i-could-charge-a-crawler"><u>private beta launch of Pay Per Crawl</u></a> put the HTTP 402 (“Payment Required”) response codes to use, working in tandem with Web Bot Auth to enable direct payments between agents and content creators. Today, we’re making customizable 402 response codes available to every paid Cloudflare customer — not just pay per crawl users.</p><p>Here's how it works: in AI Crawl Control, paying Cloudflare customers will be able to select individual bots to block with a configurable message parameter and send 402 payment required responses. Think: "To access this content, email partnerships@yoursite.com or call 1-800-LICENSE" or "Premium content available via API at api.yoursite.com/pricing."</p><p>On an average day, Cloudflare customers are already sending over one billion 402 response codes. This shows a deep desire to move beyond blocking to open communication channels and new monetization models. With the 402 HTTP status code, content creators can tell crawlers exactly how to properly license their content, creating a direct path from crawling to a commercial agreement. We are excited to make this easier than ever in the AI Crawl Control dashboard. </p>
    <div>
      <h2>How to customize your 402 status code with AI Crawl Control: </h2>
      <a href="#how-to-customize-your-402-status-code-with-ai-crawl-control">
        
      </a>
    </div>
    <p><b>For Paid Plan Users:</b></p><ul><li><p>When you block individual crawlers from the AI Crawl Control dashboard, you can now choose to send 402 Payment Required status codes and customize your message. For example: <b>To access this content, email partnerships@yoursite.com or call 1-800-LICENSE</b>.</p></li></ul><p>The response will look like this:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5v5x41azcAK14DBhXjXPEX/8c0960b4bb556d62e88d19c9dd544f12/image4.png" />
          </figure><p>The message can be configured from Settings in the AI Crawl Control Dashboard:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2KMdRYwoey9RdYIxmzmFO1/7b39fd82d43349ee1cc4832cb602eb56/image1.png" />
          </figure>
    <div>
      <h2>Beyond just blocking AI bots</h2>
      <a href="#beyond-just-blocking-ai-bots">
        
      </a>
    </div>
    <p>This is just the beginning. We're planning to add additional parameters that will let crawlers understand the content's value, freshness, and licensing terms directly in the 402 response. Imagine crawlers receiving structured data about content quality and update frequency, for example, in addition to contact information.</p><p>Meanwhile, <a href="https://blog.cloudflare.com/introducing-pay-per-crawl/">pay per crawl</a> continues advancing through beta, giving content creators the infrastructure to automatically monetize crawler access with transparent, usage-based pricing.</p><p>What excites us most is the market shift we're seeing. We're moving to a world where content creators have clear monetization paths to become active participants in the development of rich AI experiences. </p><p>The 402 response is a bridge between two industries that want to work together: content creators whose work fuels AI development, and AI companies who need high-quality data. Cloudflare’s AI Crawl Control creates the infrastructure for these partnerships to flourish.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/31Np3qX2ssbeGaJnZHQodA/92246d3618778715c2e8b295b7acaa29/image5.png" />
          </figure><div>
  
</div><p></p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[Pay Per Crawl]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[AI Bots]]></category>
            <category><![CDATA[Bot Management]]></category>
            <guid isPermaLink="false">3UcNgGUfIUIm0EEtNwgLAT</guid>
            <dc:creator>Will Allen</dc:creator>
            <dc:creator>Pulkita Kini</dc:creator>
            <dc:creator>Cam Whiteside</dc:creator>
        </item>
        <item>
            <title><![CDATA[Announcing the Cloudflare Browser Developer Program]]></title>
            <link>https://blog.cloudflare.com/announcing-the-cloudflare-browser-developer-program/</link>
            <pubDate>Mon, 18 Aug 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Announcing the Browser Developer Program: Cloudflare’s new collaborative program to help shape Cloudflare challenges that work seamlessly with your browser. Join us today! ]]></description>
            <content:encoded><![CDATA[ <p>Today, we are announcing Cloudflare’s <b>Browser Developer Program</b>, a collaborative initiative to strengthen partnership between Cloudflare and browser development teams.</p><p>Browser developers can apply to join <a href="https://forms.gle/fx8odhNNeqFELqVB9"><u>here</u></a>. </p><p>At Cloudflare, we aim to help build a better Internet. One way we achieve this is by providing website owners with the tools to detect and block unwanted traffic from bots through Cloudflare <a href="https://developers.cloudflare.com/cloudflare-challenges/"><u>Challenges</u></a> or <a href="https://developers.cloudflare.com/turnstile/"><u>Turnstile</u></a>. As both bots and our detection systems become more sophisticated, the security checks required to validate human traffic become more complicated. While we aim to strike the right balance, we recognize these security measures can sometimes cause issues for legitimate browsers and their users.</p>
    <div>
      <h2>Building a better web together</h2>
      <a href="#building-a-better-web-together">
        
      </a>
    </div>
    <p>A core objective of the program is to provide a space for intentional collaboration where we can work directly with browser developers to ensure that both accessibility and security can co-exist. We aim to support the evolving browser landscape, while upholding our responsibility to our customers to deliver the best security products. This program provides a dedicated channel for browser teams to share feedback, report issues, and help ensure that Cloudflare’s Challenges and Turnstile work seamlessly with all browsers.</p>
    <div>
      <h2>What the program includes</h2>
      <a href="#what-the-program-includes">
        
      </a>
    </div>
    <p>Browser developers in the program will benefit from:</p><ul><li><p>A two-way communication channel to Cloudflare’s team dedicated to addressing browser-specific concerns, feedback, and issues.</p></li><li><p>Best practices for building and testing against Cloudflare Challenges and Turnstile.</p></li><li><p>A private community forum for updates, questions, and discussion between browser developers and Cloudflare engineers. </p></li><li><p>Early visibility into updates or changes to that may impact how your browser handles Cloudflare Challenges.</p></li><li><p>(If applicable) Testing integration where we will incorporate your browser into our testing pipeline and monitor its performance with our releases.</p></li></ul><p>This program is designed as a partnership where Cloudflare will, with our best effort, ensure our security products work properly with all browsers, while giving browser developers a voice in how these systems evolve. As an output of this program, we expect to publish clear browser requirements to run Cloudflare Challenges while striking the balance between openness and security. </p><p>For end users browsing the web, we continue to support a wide range of <a href="https://developers.cloudflare.com/cloudflare-challenges/reference/supported-browsers/"><u>browsers</u></a>. We will continue to update this list based on the insights and collaborations from the Browser Developer Program. We are also committed to ensuring our <a href="https://developers.cloudflare.com/cloudflare-challenges/challenge-types/challenge-pages/"><u>Challenge interstitial pages</u></a> and <a href="https://developers.cloudflare.com/turnstile/"><u>Turnstile</u></a> provide clear, actionable UI/UX for any error or failed states, making it easier for you to understand and resolve issues you may encounter. </p>
    <div>
      <h2>How to apply</h2>
      <a href="#how-to-apply">
        
      </a>
    </div>
    <p>If you are working on a browser and want to ensure your users have a seamless experience with Cloudflare-protected websites, we encourage you to apply <a href="https://forms.gle/fx8odhNNeqFELqVB9"><u>here</u></a>. </p><p>We’ll ask for basic information about your project and ask you to sign our Browser Developer Program Agreement.  In addition, we expect participants to adhere to our Community Code of Conduct and commit to constructive engagement.</p><p>Once you’re accepted, you’ll be invited to a private space in the Cloudflare Community where you can engage directly with our team. </p>
    <div>
      <h2>Why is this important?</h2>
      <a href="#why-is-this-important">
        
      </a>
    </div>
    <p>Cloudflare <a href="https://developers.cloudflare.com/cloudflare-challenges/"><u>Challenges</u></a>, a security mechanism to verify whether a visitor is a human or a bot, serve a wide variety of browsers in the world today. Chrome leads with 68.0%, Safari at 8.7%, Firefox at 6.3%, Edge at 4.8%, and Opera at 6.2%. However, the very long tail of browsers that collectively make up the remaining traffic, each representing less than 1% individually but together painting a picture of an incredibly diverse web ecosystem.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7HlxV6qe25cwxRipsbap0V/3859c5065e51e3f8f37b4b18fef5cee8/BLOG-2804_2.png" />
          </figure><p><sub><i>Browser traffic distribution, with 100+ browsers comprising the 'Other' category</i></sub></p><p>This diversity spans a wide range of environments, each with unique constraints and capabilities:</p><ul><li><p>Emerging and experimental browsers pushing the boundaries of web technology</p></li><li><p>Privacy-focused browsers such as DuckDuckGo that prioritize user data protection</p></li><li><p>Embedded browsers inside social media apps like Facebook, Instagram, and TikTok</p></li><li><p>WebViews used by mobile applications</p></li><li><p>Gaming and VR browsers such as Oculus for headsets and gaming consoles</p></li><li><p>Smart device browsers built into classroom displays and home appliances</p></li></ul><p>Supporting this level of diversity poses real engineering challenges. Many of these browsers deviate from standard assumptions. Some lack full support for modern Web APIs, others operate under more stringent data privacy policies, and some are optimized for environments where our script to verify visitors may be hindered or blocked from running properly. These browsers are not bad or malicious. But their behavior may fall outside the typical patterns observed in mainstream browsers, which can lead to problematic or failed Challenge flows which we would like to avoid.</p><p>From an engineering perspective, our job is to strike a difficult balance. If our logic is too rigid that it expects only the behaviors of the majority, we risk excluding legitimate users on less conventional platforms. But if we relax our standards too much, we increase the attack surface for abuse. We cannot overfit to the top 5 browsers, nor can we afford to treat all clients as equal in capability or trustworthiness.</p><p>The Browser Developer Program is one way to close this gap. By working directly with browser teams, especially those building for niche or emerging environments, we can better understand the constraints they operate under and collaborate to make each of our systems more compatible and resilient. </p>
    <div>
      <h2>Join us!</h2>
      <a href="#join-us">
        
      </a>
    </div>
    <p>This program is free to join, and is open to any browser developer, no matter the size or the lifecycle stage. Our goal is to listen, learn, and collaborate with browser developers to create a better experience for everyone. </p><p>We believe this program will ultimately benefit end users the most. By joining this program, you will help us build solutions that prioritize both the security needs of businesses as well as the diverse ways people access the Internet. </p><p>We look forward to your participation!</p> ]]></content:encoded>
            <category><![CDATA[Turnstile]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Challenge Page]]></category>
            <guid isPermaLink="false">6VcasIRuXCvJ8K2tqUHmkG</guid>
            <dc:creator>Sally Lee</dc:creator>
            <dc:creator>Oliver Payne</dc:creator>
        </item>
        <item>
            <title><![CDATA[Perplexity is using stealth, undeclared crawlers to evade website no-crawl directives]]></title>
            <link>https://blog.cloudflare.com/perplexity-is-using-stealth-undeclared-crawlers-to-evade-website-no-crawl-directives/</link>
            <pubDate>Mon, 04 Aug 2025 13:00:00 GMT</pubDate>
            <description><![CDATA[ Perplexity is repeatedly modifying their user agent and changing IPs and ASNs to hide their crawling activity, in direct conflict with explicit no-crawl preferences expressed by websites. ]]></description>
            <content:encoded><![CDATA[ <p></p><p>We are observing stealth crawling behavior from Perplexity, an AI-powered answer engine. Although Perplexity initially crawls from their declared user agent, when they are presented with a network block, they appear to obscure their crawling identity in an attempt to circumvent the website’s preferences. We see continued evidence that Perplexity is repeatedly modifying their user agent and changing their source <a href="https://www.cloudflare.com/learning/network-layer/what-is-an-autonomous-system/"><u>ASNs</u></a> to hide their crawling activity, as well as ignoring — or sometimes failing to even fetch — <a href="https://www.cloudflare.com/learning/bots/what-is-robots-txt/"><u>robots.txt</u> </a>files.</p><p>The Internet as we have known it for the past three decades is <a href="https://blog.cloudflare.com/content-independence-day-no-ai-crawl-without-compensation/"><u>rapidly changing</u></a>, but one thing remains constant: it is built on trust. There are clear preferences that crawlers should be transparent, serve a clear purpose, perform a specific activity, and, most importantly, follow website directives and preferences. Based on Perplexity’s observed behavior, which is incompatible with those preferences, we have de-listed them as a verified <a href="https://www.cloudflare.com/learning/bots/what-is-a-bot/">bot</a> and added heuristics to our managed rules that <a href="https://www.cloudflare.com/learning/ai/how-to-block-ai-crawlers/">block this stealth crawling</a>.</p>
    <div>
      <h3>How we tested</h3>
      <a href="#how-we-tested">
        
      </a>
    </div>
    <p>We received complaints from customers who had both disallowed Perplexity crawling activity in their <code>robots.txt</code> files and also created <a href="https://www.cloudflare.com/learning/ddos/glossary/web-application-firewall-waf/">WAF rules</a> to specifically block both of Perplexity’s <a href="https://docs.perplexity.ai/guides/bots"><u>declared crawlers</u></a>: <code>PerplexityBot</code> and <code>Perplexity-User</code>. These customers told us that Perplexity was still able to access their content even when they saw its bots successfully blocked. We confirmed that Perplexity’s crawlers were in fact being blocked on the specific pages in question, and then performed several targeted tests to confirm what exact behavior we could observe.</p><p>We created multiple brand-new <a href="https://www.cloudflare.com/learning/dns/glossary/what-is-a-domain-name/">domains</a>, similar to <code>testexample.com</code> and <code>secretexample.com</code>. These domains were newly purchased and had not yet been indexed by any search engine nor made publicly accessible in any discoverable way. We implemented a <code>robots.txt</code> file with directives to stop any respectful bots from accessing any part of a website:  </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/66QyzKuX9DQqQYPvCZpw4m/78e7bbd4ff79dd2f1523e70ef54dab9e/BLOG-2879_-_2.png" />
          </figure><p>We conducted an experiment by querying Perplexity AI with questions about these domains, and discovered Perplexity was still providing detailed information regarding the exact content hosted on each of these restricted domains. This response was unexpected, as we had taken all necessary precautions to prevent this data from being retrievable by their <a href="https://www.cloudflare.com/learning/bots/what-is-a-web-crawler/"><u>crawlers</u></a>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/08ZLg0OE7vX8x35f9rDeg/a3086959793ac565b329fbbab5e52d1e/BLOG-2879_-_3.png" />
          </figure><p></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5uHc0gooXlr98LB56KBb3g/b7dae5987a64f2442d1f89cf21e974ba/BLOG-2879_-_4.png" />
          </figure>
    <div>
      <h3>Obfuscating behavior observed</h3>
      <a href="#obfuscating-behavior-observed">
        
      </a>
    </div>
    <p><b>Bypassing Robots.txt and undisclosed IPs/User Agents</b></p><p>Our multiple test domains explicitly prohibited all automated access by specifying in robots.txt and had specific WAF rules that blocked crawling from <a href="https://docs.perplexity.ai/guides/bots"><u>Perplexity’s public crawlers</u></a>. We observed that Perplexity uses not only their declared user-agent, but also a generic browser intended to impersonate Google Chrome on macOS when their declared crawler was blocked. </p><table><tr><td><p>Declared</p></td><td><p>Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Perplexity-User/1.0; +https://perplexity.ai/perplexity-user)</p></td><td><p>20-25m daily requests</p></td></tr><tr><td><p>Stealth</p></td><td><p>Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36</p></td><td><p>3-6m daily requests</p></td></tr></table><p>Both their declared and undeclared crawlers were attempting to access the content for scraping contrary to the web crawling norms as outlined in RFC <a href="https://datatracker.ietf.org/doc/html/rfc9309"><u>9309</u></a>.</p><p>This undeclared crawler utilized multiple IPs not listed in <a href="https://docs.perplexity.ai/guides/bots"><u>Perplexity’s official IP range</u></a>, and would rotate through these IPs in response to the restrictive robots.txt policy and block from Cloudflare. In addition to rotating IPs, we observed requests coming from different ASNs in attempts to further evade website blocks. This activity was observed across tens of thousands of domains and millions of requests per day. We were able to fingerprint this crawler using a combination of <a href="https://www.cloudflare.com/learning/ai/what-is-machine-learning/">machine learning</a> and network signals.</p><p>An example: </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4UKtFs1UPddDh9OCtMuwzC/bcdabf5fdd9b0d029581b14a90714d91/unnamed.png" />
          </figure><p>Of note: when the stealth crawler was successfully blocked, we observed that Perplexity uses other data sources — including other websites — to try to create an answer. However, these answers were less specific and lacked details from the original content, reflecting the fact that the block had been successful. </p>
    <div>
      <h2>How well-meaning bot operators respect website preferences</h2>
      <a href="#how-well-meaning-bot-operators-respect-website-preferences">
        
      </a>
    </div>
    <p>In contrast to the behavior described above, the Internet has expressed clear preferences on how good crawlers should behave. All well-intentioned crawlers acting in good faith should:</p><p><b>Be transparent</b>. Identify themselves honestly, using a unique user-agent, a declared list of IP ranges or <a href="https://developers.cloudflare.com/bots/concepts/bot/verified-bots/web-bot-auth/"><u>Web Bot Auth</u></a> integration, and provide contact information if something goes wrong.</p><p><b>Be well-behaved netizens</b>. Don’t flood sites with excessive traffic, <a href="https://www.cloudflare.com/learning/bots/what-is-data-scraping/"><u>scrape</u></a> sensitive data, or use stealth tactics to try and dodge detection.</p><p><b>Serve a clear purpose</b>. Whether it’s powering a voice assistant, checking product prices, or making a website more accessible, every bot has a reason to be there. The purpose should be clearly and precisely defined and easy for site owners to look up publicly.</p><p><b>Separate bots for separate activities</b>. Perform each activity from a unique bot. This makes it easy for site owners to decide which activities they want to allow. Don’t force site owners to make an all-or-nothing decision. </p><p><b>Follow the rules</b>. That means checking for and respecting website signals like <code>robots.txt</code>, staying within rate limits, and never bypassing security protections.</p><p>More details are outlined in our official <a href="https://developers.cloudflare.com/bots/concepts/bot/verified-bots/policy/"><u>Verified Bots Policy Developer Docs</u></a>.</p><p>OpenAI is an example of a leading AI company that follows these best practices. They clearly <a href="https://platform.openai.com/docs/bots"><u>outline their crawlers</u> and </a>give detailed explanations for each crawler’s purpose. They respect robots.txt and do not try to evade either a robots.txt directive or a network level block. And <a href="https://openai.com/index/introducing-chatgpt-agent/"><u>ChatGPT Agent</u></a> is signing http requests using the newly proposed open standard <a href="https://developers.cloudflare.com/bots/concepts/bot/verified-bots/web-bot-auth/"><u>Web Bot Auth</u></a>.</p><p>When we ran the same test as outlined above with ChatGPT, we found that ChatGPT-User fetched the robots file and stopped crawling when it was disallowed. We did not observe follow-up crawls from any other user agents or third party bots. When we removed the disallow directive from the robots entry, but presented ChatGPT with a block page, they again stopped crawling, and we saw no additional crawl attempts from other user agents. Both of these demonstrate the appropriate response to website owner preferences.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/HMJjS7DRmu4octZ99HX8K/753966a88476f80d7a981b1c135fd251/BLOG-2879_-_6.png" />
          </figure>
    <div>
      <h2>How can you protect yourself?</h2>
      <a href="#how-can-you-protect-yourself">
        
      </a>
    </div>
    <p>All the undeclared crawling activity that we observed from Perplexity’s hidden User Agent was scored by our <a href="https://www.cloudflare.com/application-services/products/bot-management/">bot management system </a>as a bot and was unable to pass managed challenges. Any bot management customer who has an existing block rule in place is already protected. Customers who don’t want to block traffic can set up rules to <a href="https://developers.cloudflare.com/waf/custom-rules/use-cases/challenge-bad-bots/"><u>challenge requests</u></a>, giving real humans an opportunity to proceed. Customers with existing challenge rules are already protected. Lastly, we added signature matches for the stealth crawler into our <a href="https://developers.cloudflare.com/bots/concepts/bot/#ai-bots"><u>managed rule</u></a> that <a href="https://developers.cloudflare.com/bots/additional-configurations/block-ai-bots/"><u>blocks AI crawling activity</u></a>. This rule is available to all customers, including our free customers.  </p>
    <div>
      <h2>What’s next?</h2>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>It's been just over a month since we announced <a href="https://blog.cloudflare.com/content-independence-day-no-ai-crawl-without-compensation/">Content Independence Day</a>, giving content creators and publishers more control over how their content is accessed. Today, over two and a half million websites have chosen to completely disallow AI training through our managed robots.txt feature or our <a href="https://developers.cloudflare.com/bots/concepts/bot/#ai-bots"><u>managed rule blocking AI Crawlers</u></a>. Every Cloudflare customer is now able to selectively decide which declared AI crawlers are able to access their content in accordance with their business objectives.</p><p>We expected a change in bot and crawler behavior based on these new features, and we expect that the techniques bot operators use to evade detection will continue to evolve. Once this post is live the behavior we saw will almost certainly change, and the methods we use to stop them will keep evolving as well. </p><p>Cloudflare is actively working with technical and policy experts around the world, like the IETF efforts to standardize <a href="https://ietf-wg-aipref.github.io/drafts/draft-ietf-aipref-vocab.html?cf_target_id=_blank"><u>extensions to robots.txt</u></a>, to establish clear and measurable principles that well-meaning bot operators should abide by. We think this is an important next step in this quickly evolving space.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/25VWBDa33UWxDOtqEVEx5o/41eb4ddc262551b83179c1c23a9cb1e6/BLOG-2879_-_7.png" />
          </figure><p></p> ]]></content:encoded>
            <category><![CDATA[Cloudforce One]]></category>
            <category><![CDATA[Threat Intelligence]]></category>
            <category><![CDATA[AI Bots]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Bot Management]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Generative AI]]></category>
            <guid isPermaLink="false">6XJtrSa1t6frcelkMGuYOV</guid>
            <dc:creator>Gabriel Corral</dc:creator>
            <dc:creator>Vaibhav Singhal</dc:creator>
            <dc:creator>Brian Mitchell</dc:creator>
            <dc:creator>Reid Tatoris</dc:creator>
        </item>
        <item>
            <title><![CDATA[The crawl before the fall… of referrals: understanding AI’s impact on content providers]]></title>
            <link>https://blog.cloudflare.com/ai-search-crawl-refer-ratio-on-radar/</link>
            <pubDate>Tue, 01 Jul 2025 10:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare Radar now shows how often a given AI model sends traffic to a site relative to how often it crawls that site. This helps site owners make decisions about which AI bots to allow or block.
 ]]></description>
            <content:encoded><![CDATA[ <p>Content publishers welcomed crawlers and bots from search engines because they helped drive traffic to their sites. The <a href="https://www.cloudflare.com/learning/bots/what-is-a-web-crawler/"><u>crawlers</u></a> would see what was published on the site and surface that material to users searching for it. Site owners could monetize their material because those users still needed to click through to the page to access anything beyond a short title.</p><p><a href="https://www.cloudflare.com/learning/ai/what-is-artificial-intelligence/"><u>Artificial Intelligence (AI)</u></a> bots also crawl the content of a site, but with an entirely different delivery model. These <a href="https://www.cloudflare.com/learning/ai/what-is-large-language-model/"><u>Large Language Models (LLMs)</u></a> do their best to read the web to train a system that can repackage that content for the user, without the user ever needing to visit the original publication.</p><p>The AI applications might still try to cite the content, but we’ve found that very few users actually click through relative to how often the AI bot <a href="https://www.cloudflare.com/learning/bots/what-is-content-scraping/"><u>scrapes</u></a> a given website. We have discussed this challenge in smaller settings, and today we are excited to publish our findings as <a href="https://radar.cloudflare.com/ai-insights#crawl-to-refer-ratio"><u>a new metric shown on the AI Insights page on Cloudflare Radar</u></a>.</p><p>Visitors to Cloudflare Radar can now review how often a given AI model sends traffic to a site relative to how often it crawls that site. We are sharing this analysis with a broad audience so that site owners can have better information to help them make decisions about which AI bots to allow or block and so that users can understand how AI usage in aggregate impacts Internet traffic.</p>
    <div>
      <h2>How does this measurement work?</h2>
      <a href="#how-does-this-measurement-work">
        
      </a>
    </div>
    <p>As HTML pages are arguably the most valuable content for these crawlers, the ratios displayed are calculated by dividing the total number of requests from relevant user agents associated with a given search or AI platform where the response was of <code>Content-type: text/html</code> by the total number of requests for HTML content where the <code>Referer</code> header contained a hostname associated with a given search or AI platform.</p><p>The diagrams below illustrate two common crawling scenarios, and show that companies may use different user agents depending on the purpose of the crawler. The top one represents a simple transaction where the example AI platform is requesting content for the purposes of training an LLM, representing itself as <code>AIBot</code>. The bottom one represents a scenario where the example AI platform is requesting content to service a user request — looking for flight information, for example. In this case, it is representing itself as <code>AIBot-User</code>. Request traffic from both of these user agents would be aggregated under a single platform name for the purposes of our analysis. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3SOsmpe6TAWwqK6g9irLI2/cca037eadf97578f7851e24ba6b90af4/image9.png" />
          </figure><p>When a user clicks on a link on a website or application, the client will often send a <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Referer"><code><u>Referer:</u></code><u> header</u></a> as part of the request to the target site. In the diagram below, the example AI platform has returned content that contains links to external sites in response to a user interaction. When the user clicks on a link, a request is made to the content provider that includes <code>ai.example.com </code>in the <code>Referer:</code> header, letting them know where that request traffic came from. Hostnames are associated with their respective platforms for the purpose of our analysis.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5WqrD6q6k4ng8sBLbgzp42/b139464c5653d3cab533bf6413930a62/image10.png" />
          </figure>
    <div>
      <h2>Observations</h2>
      <a href="#observations">
        
      </a>
    </div>
    
    <div>
      <h3>Reviewing the ratios</h3>
      <a href="#reviewing-the-ratios">
        
      </a>
    </div>
    <p>The new metric is presented as a simple table, comparing the number of aggregate HTML page requests from crawlers (user agents) associated with a given platform to the number of HTML page requests from clients referred by a hostname associated with a given platform. The calculated ratio is always normalized to a single referral request.</p><p>The table below shows that for the period June 19-26, 2025, as an example, the ratios range from Anthropic’s 70,900:1 down to Mistral’s 0.1:1. This means that Anthropic’s AI platform Claude made nearly 71,000 HTML page requests for every HTML page referral, while Mistral sent 10x as many referrals as crawl requests. (However, traffic referred by Claude’s native app does not include a <code>Referer:</code> header, and we believe that the same holds true for traffic generated from other native apps as well. As such, because the referral counts only include traffic from the Web-based tools from these providers, these calculations may overstate the respective ratios, but it is unclear by how much.)</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1JaUDnjXMlq5YMxuKZGh7b/31210c8cd80779974450adfb4909f1cd/image7.png" />
          </figure><p>Of course, due in part to changes in crawling patterns, these ratios will change over time. The table above also displays the ratio changes as compared to the previous period, with changes ranging from increases of over 6% for DuckDuckGo and Yandex to Google’s 19.4% decrease. The week-over-week drop in Google’s ratio is related to an observed drop in crawling traffic from <code>GoogleBot</code> starting on June 24, while Yandex’s week-over-week growth is related to an observed increase in <code>YandexBot</code> crawling activity that started on June 21, as seen in the graphs below.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2UThXDeJepqM6jQCzXMvvw/f2d75d2202c33711f9eaa0a38c01a9f3/image3.png" />
          </figure>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4FDYlEWYztxZCJZMg5RPvf/b4a3dac2dc4a06b709e2ef8d74ea1bc0/image10.png" />
          </figure><p>Radar’s Data Explorer includes a <a href="https://radar.cloudflare.com/explorer?dataSet=bots.crawlers&amp;groupBy=crawl_refer_ratio&amp;dt=2025-05-01_2025-05-28"><u>time series view of how these ratios change over time</u></a>, such as in the Baidu example below. The time series data is also available through an <a href="https://developers.cloudflare.com/api/resources/radar/subresources/bots/subresources/web_crawlers/methods/timeseries_groups/"><u>API endpoint</u></a>.</p>
    <div>
      <h3>Patterns in referral traffic</h3>
      <a href="#patterns-in-referral-traffic">
        
      </a>
    </div>
    <p>Changes and trends in the underlying activity can be seen in the <a href="https://radar.cloudflare.com/explorer?dataSet=bots.crawlers&amp;groupBy=referer&amp;timeCompare=1"><u>associated Data Explorer view</u></a>, as well as in the raw data available via API endpoints (<a href="https://developers.cloudflare.com/api/resources/radar/subresources/bots/subresources/web_crawlers/methods/timeseries_groups/"><u>timeseries</u></a>, <a href="https://developers.cloudflare.com/api/resources/radar/subresources/bots/subresources/web_crawlers/methods/summary/"><u>summary</u></a>). Note that the shares of both referral and crawl traffic are relative to the sets of referrers and crawlers included in the graphs, and not Cloudflare traffic overall.</p><p>For example, in the referrer-centric view below, covering nearly the first four weeks of June 2025, we can see that referral traffic is dominated by search platform Google, with a fairly consistent diurnal pattern visible in the data. (The <code>google.*</code> entry covers referral traffic from the main <a href="http://google.com"><u>google.com</u></a> site, as well as local sites, such as <a href="http://google.es"><u>google.es</u></a> or <a href="http://google.com.tw"><u>google.com.tw</u></a>.) Because of prefetching driven by the use of <a href="https://developer.chrome.com/blog/search-speculation-rules"><u>speculation rules</u></a>, referral traffic coming from Google’s ASN (AS15169) is specifically excluded from analysis here, as it doesn’t represent active user consumption of content.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5pNnqBHkfJEEGioN1dhpi5/65251de2ad63e0cef0ee2340e79f2f4b/image14.png" />
          </figure><p>Clear diurnal patterns are also visible in the referral request shares of other search platforms, although the request shares are a fraction of what is seen from Google.  </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5flVZwDhtYlseH5uYDk76U/a03e9957a10983e87e4fcd8f6a9e59bf/image4.png" />
          </figure><p>Throughout June, the share of traffic referred by AI platforms was significantly lower, even in aggregate, than the share of traffic referred by search platforms.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/705m9ac6GXGgT4qshubY70/3c6c0ca43be66114be53fa607bcb857d/image8.png" />
          </figure>
    <div>
      <h3>Changes in crawling traffic</h3>
      <a href="#changes-in-crawling-traffic">
        
      </a>
    </div>
    <p>As noted above, the change in ratio values over time can be driven by shifts in crawling activity. These shifts are visible in the <a href="https://radar.cloudflare.com/explorer?dataSet=bots.crawlers&amp;groupBy=user_agent&amp;timeCompare=1"><u>crawling traffic shares available in Data Explorer</u></a>, as well as in the raw data available via API endpoints (<a href="https://developers.cloudflare.com/api/resources/radar/subresources/bots/subresources/web_crawlers/methods/timeseries_groups/"><u>timeseries</u></a>, <a href="https://developers.cloudflare.com/api/resources/radar/subresources/bots/subresources/web_crawlers/methods/summary/"><u>summary</u></a>). In the crawler-centric view below, covering nearly the first four weeks of June 2025, we can see that the share of requests related to Google’s crawling activity for both their <code>Googlebot</code> and <code>GoogleOther</code> identifiers falls over the course of the month, with several peak/valley periods. A similar pattern <a href="https://radar.cloudflare.com/explorer?dataSet=http&amp;loc=as15169&amp;dt=2025-05-31_2025-06-27"><u>observed in HTTP request traffic from Google’s AS15169</u></a> during that same time period loosely matches this observed drop in share.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1K92yRMz57QrRH7iPvNH4V/0f7d7816fb3b22232dbee8359127b367/image11.png" />
          </figure><p>In addition, it appears that OpenAI’s <code>GPTBot</code> saw multiple periods where little-to-no crawling activity was observed throughout the month.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/sXdBr25Y4toS2t3nvPKMm/e1313d3356130bc333a2e03574e56661/image13.png" />
          </figure>
    <div>
      <h2>What this means for content providers</h2>
      <a href="#what-this-means-for-content-providers">
        
      </a>
    </div>
    <p>These ratios directly impact the viability of content publication on the Internet. While they will vary over time, the trend continues to be more crawls and fewer referrals when compared in relation to each other. Legacy search index crawlers would scan your content a couple of times, or less, for each visitor sent. A site’s availability to crawlers made their revenue model more viable, not less.</p><p>The new data we are observing suggests that is no longer the case. These models continue to consume more content, more frequently, despite sending the same or less traffic to the source of its content.</p><p>We have <a href="https://blog.cloudflare.com/cloudflare-ai-audit-control-ai-content-crawlers/"><u>released new tools</u></a> over the last year to help site owners take control back. With a single click, publishers can <a href="https://www.cloudflare.com/learning/ai/how-to-block-ai-crawlers/">block the kinds of AI crawlers that train against their content</a>. And today, <a href="https://blog.cloudflare.com/introducing-pay-per-crawl"><u>we announced new ways</u></a> to make the exchange of value fair for both sides of the equation. However, we continue to recommend that content creators audit and then enforce their preferred policies for AI crawlers.</p>
    <div>
      <h2>One more thing…</h2>
      <a href="#one-more-thing">
        
      </a>
    </div>
    <p>In addition to providing these new insights around crawling and referral traffic and associated trends, we’ve also taken the opportunity to launch expanded Verified Bots content. The <a href="https://radar.cloudflare.com/bots"><u>Bots page on Cloudflare Radar</u></a> includes a paginated list of <a href="https://developers.cloudflare.com/bots/concepts/bot/verified-bots/"><u>Verified Bots</u></a>, displaying the bot name, owner, category, and rank (based on request volume). This list has now been expanded into a <a href="https://radar.cloudflare.com/bots/directory"><u>standalone directory in a new Bots section</u></a>. The directory, shown below, displays a card for each Verified Bot, showing the bot name, a description, the bot owner and category, and verification status. Users can search the directory by bot name, owner, or description, and can also filter by category (selecting just <i>Monitoring &amp; Analytics</i> bots, for example).</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7nTytFwnB1NVuwnAeAduX8/40efad4c333d8046d28a7ee44a8d91ca/image2.png" />
          </figure><p>Clicking on a bot name within a card brings up a bot-specific page that includes metadata about the bot, information on how the bot’s user agent is represented in <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/User-Agent"><u>HTTP request headers</u></a> and how it should be <a href="https://datatracker.ietf.org/doc/html/rfc9309#name-the-user-agent-line"><u>specified in robots.txt directives</u></a>, and a traffic graph that shows associated HTTP request volume trends for the selected time period (with a default comparison to the previous period). Associated data is also available via the <a href="https://developers.cloudflare.com/api/resources/radar/subresources/bots/"><u>API</u></a>. As we add additional information to these bot-specific pages in the future, we will document the updates in <a href="https://developers.cloudflare.com/changelog/?product=radar"><u>Changelog entries</u></a>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1SY1pwRzVnvC1sFNANrPxx/003260c3fdd3792cdff55d3a95628592/image12.png" />
          </figure><p></p> ]]></content:encoded>
            <category><![CDATA[Pay Per Crawl]]></category>
            <category><![CDATA[Radar]]></category>
            <category><![CDATA[Internet Traffic]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Bots]]></category>
            <guid isPermaLink="false">2pLY5VumUNgntdcfkU9Ua3</guid>
            <dc:creator>David Belson</dc:creator>
            <dc:creator>Sam Rhea</dc:creator>
        </item>
        <item>
            <title><![CDATA[Control content use for AI training with Cloudflare’s managed robots.txt and blocking for monetized content]]></title>
            <link>https://blog.cloudflare.com/control-content-use-for-ai-training/</link>
            <pubDate>Tue, 01 Jul 2025 10:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare is making it easier for publishers and content creators of all sizes to prevent their content from being scraped for AI training by managing robots.txt on their behalf.  ]]></description>
            <content:encoded><![CDATA[ <p>Cloudflare is giving all website owners two new tools to easily control whether AI bots are allowed to access their content for model training. First, customers can let Cloudflare <b>create and manage a robots.txt file</b>, creating the appropriate entries to let crawlers know not to access their site for AI training. Second, all customers can choose a new option to <a href="https://www.cloudflare.com/learning/ai/how-to-block-ai-crawlers/">block AI bots</a> <b>only on portions of their site that are monetized through ads</b>.</p>
    <div>
      <h2>The new generation of AI crawlers</h2>
      <a href="#the-new-generation-of-ai-crawlers">
        
      </a>
    </div>
    <p>Creators that monetize their content by showing ads depend on traffic volume. Their livelihood is directly linked to the number of views their content receives. These creators have allowed crawlers on their sites for decades, for a simple reason: search crawlers such as <code>Googlebot</code> made their sites more discoverable, and drove more traffic to their content. Google benefitted from delivering better search results to their customers, and the site owners also benefitted through increased views, and therefore increased revenues.</p><p>But recently, a new generation of crawlers has appeared: bots that crawl sites to gather data for training AI models. While these crawlers operate in the same technical way as search crawlers, the relationship is no longer symbiotic. AI training crawlers use the data they ingest from content sites to answer questions for their own customers directly, within their own apps. They typically send much less traffic back to the site they crawled. Our <a href="https://radar.cloudflare.com/"><u>Radar</u></a> team did an analysis of crawls and referrals for sites behind Cloudflare. As HTML pages are arguably the most valuable content for these crawlers, we <a href="https://blog.cloudflare.com/ai-search-crawl-refer-ratio-on-radar/"><u>calculated crawl ratios</u></a> by dividing the total number of requests from relevant user agents associated with a given search or AI platform where the response was of <code>Content-type: text/html</code> by the total number of requests for HTML content where the <code>Referer</code>: header contained a hostname associated with a given search or AI platform. As of June 2025, we find that Google crawls websites about 14 times for every referral. But for AI companies, the <a href="https://radar.cloudflare.com/ai-insights#crawl-to-refer-ratio"><u>crawl-to-refer ratio</u></a> is orders of magnitude greater. In June 2025, <b>OpenAI’s crawl-to-referral ratio was 1,700:1, Anthropic’s 73,000:1</b>. This clearly breaks the “crawl in exchange for traffic” relationship that previously existed between search crawlers and publishers. (Please note that this calculation reflects our best estimate, recognizing that traffic referred by native apps may not always be attributed to a provider due to a lack of a <code>Referer</code>: header, which may affect the ratio.)</p><p>And while sites can use robots.txt to tell these bots not to crawl their site, most don’t take this first step. We found that only about <a href="https://radar.cloudflare.com/ai-insights#ai-user-agents-found-in-robotstxt"><b><u>37% of the top 10,000 domains currently have a robots.txt file</u></b></a>, showing that robots.txt is underutilized in this age of evolving crawlers.</p><p>That’s where Cloudflare comes in. Our mission is to help build a better Internet, and a better Internet is one with a huge thriving ecosystem of independent publishers. So, we’re taking action to keep that ecosystem alive.</p>
    <div>
      <h2>Giving ALL customers full control</h2>
      <a href="#giving-all-customers-full-control">
        
      </a>
    </div>
    <p>Protecting content creators isn’t new for Cloudflare. In July 2024, we gave everyone on the Cloudflare network a simple way to <a href="https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/"><u>block all AI scrapers with a single click</u></a> for free. We’ve already seen <b>more than 1 million customers enable this feature</b>, which has given us some interesting data.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2B8KAmaP6DrMEMW5YSjLYP/d9eb0f67a998b730373a27aa707ade9d/image5.png" />
          </figure><p>Since our last update, we can see that <code><b>Bytespider</b></code><b>, our previous top bot, has seen traffic volume decline 71.45% since the first week of July 2024</b>. During the same time, we saw an increased number of <code>Bytespider</code> requests that customers chose to specifically block. In contrast, <code>GPTBot</code> traffic volume has grown significantly as it has become more popular, now even surpassing traffic we see from big traditional tech players like Amazon and ByteDance.</p><p>The share of sites accessed by particular crawlers has gone down across the board since our last update. Previously, <code>Bytespider</code> accessed &gt;40% of websites protected by Cloudflare, but that number has dropped to only 9.37%. <code><b>GPTBot</b></code><b> has taken the top spot for most sites accessed</b>, but while its request volume has grown significantly (noted above), the share of sites it crawls has actually decreased since last year from 35.46% to 28.97%, with an increase in customers blocking.</p><table><tr><td><p>AI Bot</p></td><td><p>Share of Websites Accessed</p></td></tr><tr><td><p>GPTBot</p></td><td><p>28.97%</p></td></tr><tr><td><p>Meta-ExternalAgent</p></td><td><p>22.16%</p></td></tr><tr><td><p>ClaudeBot</p></td><td><p>18.80%</p></td></tr><tr><td><p>Amazonbot</p></td><td><p>14.56%</p></td></tr><tr><td><p>Bytespider</p></td><td><p>9.37%</p></td></tr><tr><td><p>GoogleOther</p></td><td><p>9.31%</p></td></tr><tr><td><p>ImageSiftBot</p></td><td><p>4.45%</p></td></tr><tr><td><p>Applebot</p></td><td><p>3.77%</p></td></tr><tr><td><p>OAI-SearchBot</p></td><td><p>1.66%</p></td></tr><tr><td><p>ChatGPT-User</p></td><td><p>1.06%</p></td></tr></table><p>And while AI Search and AI Assistant crawling related activity has exploded in popularity in the last 6 months, we still see their total traffic pale in comparison to AI training crawl activity, which has seen a <b>65% increase in traffic over the past 6 months</b>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7nOWMQs8IzgS3RfrXHaVT1/b1b31024a92b70a3f39083b376bb3934/image4.png" />
          </figure><p>To this end, we launched <a href="https://blog.cloudflare.com/cloudflare-ai-audit-control-ai-content-crawlers/"><u>free granular auditing</u></a> in September 2024 to help customers understand which crawlers were accessing their content most often, and created simple templates to block all or specific crawlers. And in December 2024, we made it easy for publishers to automatically block <a href="https://blog.cloudflare.com/ai-audit-enforcing-robots-txt/"><u>crawlers that weren’t respecting robots.txt</u></a>. But we realized many sites didn’t have the time to create or manage their own robots.txt file. Today, we’re going two steps further.</p>
    <div>
      <h2>Step 1: fully managed robots.txt</h2>
      <a href="#step-1-fully-managed-robots-txt">
        
      </a>
    </div>
    <p>When it comes to managing your website’s visibility to search engine crawlers and other bots, the <code>robots.txt</code> file is a key player. This simple text file acts like a traffic controller, signaling to bots which parts of the website they should or should not access. We can think of <a href="https://www.cloudflare.com/learning/bots/what-is-robots-txt/"><u>robots.txt</u></a> as a "Code of Conduct" sign posted at a community pool, listing general dos and don'ts, according to the pool owner’s wishes. While the sign itself does not enforce the listed directives, well-behaved visitors will still read the sign and follow the instructions they see. On the other hand, poorly-behaved visitors who break the rules risk <a href="https://blog.cloudflare.com/ai-audit-enforcing-robots-txt/"><u>getting themselves banned</u></a>. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6oGxSRxy3sU88o4TZP7p42/aea1d7bbf5e57eb133ce8cdfae88dc37/image2.png" />
          </figure><p>What do these files actually look like? Take Google’s as an example, visible to anyone at <a href="https://www.google.com/robots.txt"><u>https://www.google.com/robots.txt</u></a>. Parsing its contents, you'll notice four directives in the set of instructions: <b>User-agent</b>, <b>Disallow</b>, <b>Allow</b>, and <b>Sitemap</b>. In a <code>robots.txt</code> file, the <b>User-agent</b> directive specifies which bots the rules apply to. The <b>Disallow</b> directive tells those bots which parts of the website they should avoid. In contrast, the <b>Allow</b> directive grants specific bots permission to access certain areas. Finally, the<a href="https://www.sitemaps.org/index.html"> <b>Sitemap</b> directive</a> shows a bot which pages it can reach, so that it won’t miss any important pages. The <a href="https://www.ietf.org/"><u>Internet Engineering Task Force (IETF)</u></a> formalized the definition and language for the Robots Exclusion Protocol in <a href="https://datatracker.ietf.org/doc/html/rfc9309"><u>RFC 9309</u></a>, specifying the exact syntax and precedence of these directives. It also outlines how crawlers should handle errors or redirects while stressing that compliance is <i>voluntary</i> and does not constitute access control. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/79JML5EIN1f4NVzRankehO/20a2c99ccaca62e7718c9d66bb8585d5/image10.png" />
          </figure><p>Website owners should have agency over AI bot activity on their websites. We mentioned that only 37% of the top 10,000 domains on Cloudflare even have a robots.txt file. Of those robots files that do exist, few include Disallow directives for the <a href="https://radar.cloudflare.com/ai-insights#ai-bot-crawler-traffic"><i><u>top</u></i><u> AI Bots</u></a> that we see on a daily basis.  For instance, as of publication, <a href="https://radar.cloudflare.com/explorer?dataSet=robots_txt&amp;groupBy=user_agents%2Fdirective&amp;filters=directive%253DDISALLOW"><code><u>GPTBot</u></code><u> is only disallowed in 7.8% of the robots.txt files</u></a> found for the top domains; <code>Google-Extended</code> only shows up in 5.6%; <code>anthropic-ai</code>, <code>PerplexityBot</code>, <code>ClaudeBot</code>, and <code>Bytespider</code> each show up in under 5%. Furthermore, the difference between the 7.8% of Disallow directives for <code>GPTBot</code> and the ~5% of Disallow directives for other major AI crawlers suggests a gap between the desire to <a href="https://www.cloudflare.com/learning/ai/how-to-prevent-web-scraping/">prevent your content from being used for AI model training</a> and the proper configuration that accomplishes this by calling out bots like <code>Google-Extended</code>. (After all, there’s more to stopping AI crawlers than disallowing <code>GPTBot</code>.)</p><p>Along with viewing the most active bots and crawlers, Cloudflare Radar also shares weekly updates on how websites are handling <a href="https://radar.cloudflare.com/ai-insights?cf_target_id=3D982CE3E88C4E32F9D4AA79E7869F7C#ai-user-agents-found-in-robotstxt"><u>AI bots in their robots.txt files</u></a>. We can examine two snapshots below, one from <a href="https://radar.cloudflare.com/ai-insights?dateStart=2025-06-23&amp;dateEnd=2025-06-24"><u>June 2025</u></a> and the other from <a href="https://radar.cloudflare.com/ai-insights?dateStart=2025-01-26&amp;dateEnd=2025-02-01"><u>January 2025</u></a>:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/30Wc2jLvDqSMBKF5QxU2yc/f18b44d8ba9d11687c0224b40cf12675/image6.png" />
          </figure><p><sub><i>Radar snapshot from the week of June 23, 2025, showing the top AI user agents mentioned in the Disallow directive in robots.txt files across the top 10,000 domains. The 3 bots with the highest number of Disallows are GPTBot, CCBot, and facebookexternalhit.</i></sub></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/T9krKSMLRud7sYgG7ahei/8632afeba6d22baa304ae9fd901e187a/image9.png" />
          </figure><p><sub><i>Radar snapshot from the week of January 26, 2025, showing the top AI user agents mentioned in the Disallow directive in robots.txt files across the top 10,000 domains. The 3 bots with the highest number of Disallows are GPTBot, CCBot, and anthropic-ai.</i></sub></p><p>From the above data, we also observe that fewer than 100 new robots.txt files have been added among the top domains between January and June. One visually striking change is the ratio of dark blue to light blue: compared to January, there is a steep decrease in “Partially Disallowed” permissions; websites are now flat-out choosing “Fully Disallowed” for the top AI crawlers, including <code>GPTBot</code>, <code>CCBot</code>, and <code>Google-Extended</code>. This underscores the changing landscape of web crawling, particularly the relationship of trust between website owners and AI crawlers.</p>
    <div>
      <h3>Putting up a guardrail with Cloudflare’s managed robots.txt</h3>
      <a href="#putting-up-a-guardrail-with-cloudflares-managed-robots-txt">
        
      </a>
    </div>
    <p>Many website owners have told us they’re in a tricky spot in this new era of AI crawlers. They’ve poured time and effort into creating original content, have published it on their own sites, and naturally want it to reach as many people as possible. To do that, website owners make their sites accessible to search engine crawlers, which index the content and make it discoverable in search results. But with the rise of AI-powered crawlers, that same content is now being scraped not just for indexing, but also to train AI models, often without the creator’s explicit consent. Take <code>Googlebot</code>, for example: it’s an absolute requirement for most website owners to allow for SEO. But Google crawls with user agent <code>Googlebot</code> for both SEO <i>and</i> AI training purposes. Specifically disallowing <a href="https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers#google-extended"><code><u>Google-Extended</u></code></a> (but not <code>Googlebot</code>) in your robots.txt file is what communicates to Google that you do not want your content to be crawled to feed AI training.</p><p>So, what if you don’t want your content to serve as training data for the next AI model, but don’t have the time to manually maintain an up-to-date robots.txt file? <b>Enter Cloudflare’s new managed robots.txt offering.</b> Once enabled, Cloudflare will automatically update your existing robots.txt or create a robots.txt file on your site that includes directives asking popular AI bot operators to not use your content for AI model training. For instance, <b>Cloudflare’s managed robots.txt signals your preference to </b><code><b>Google-Extended</b></code><b> and </b><a href="https://support.apple.com/en-us/119829"><code><b><u>Applebot-Extended</u></b></code></a><b>, amongst others, that they should not crawl your site for AI training,</b> while keeping your domain(s) SEO-friendly.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2SLxL9LMN1IK2WXOIq8ezP/786db3e1cbc24b1cce4c337b8136d3a7/image3.png" />
          </figure><p><sup><i>Cloudflare dashboard snapshot of the new managed robots.txt activation toggle </i></sup></p><p>This feature is available to all customers, meaning anyone can <a href="https://developers.cloudflare.com/bots/additional-configurations/managed-robots-txt/"><u>enable this today</u></a> from the Cloudflare dashboard. Once enabled, website owners who previously had no robots.txt file will now have Cloudflare’s managed bot directives live on their website. What about website owners who already have a robots.txt file? The contents of Cloudflare’s managed robots.txt will be <i>prepended</i> to site owners’ existing file. This way, their existing Block directives – and the time and rationale put into customizing this file – are honored, while still ensuring the website has AI crawler guardrails managed by Cloudflare.</p><p>As the AI bot landscape changes with new bots on the rise, Cloudflare will keep our customers a step ahead by updating the directives on our managed robots.txt, so they don’t have to worry about maintaining things on their own. Once enabled, customers won’t need to take any action in order for any updates of the managed robots.txt content to go live on their site. </p><p>We believe that managing crawling is key to protecting the open Internet, so we’ll also be encouraging every new site that onboards to Cloudflare to enable our managed robots.txt. When you onboard a new site, you’ll see the following options for managing AI crawlers:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6l4RpmHHf0OGP44XyDnZra/66c30bb8080d3107ab93af55dc6a8c6e/Screenshot_2025-06-30_at_3.59.54%C3%A2__PM.png" />
          </figure><p>This makes it effortless to ensure that <b>every new customer or domain onboarded to Cloudflare gives clear directives to how they want their content used.</b></p>
    <div>
      <h3>Under the hood: technical implementation</h3>
      <a href="#under-the-hood-technical-implementation">
        
      </a>
    </div>
    <p>To implement this feature, we developed a new module that intercepts all inbound HTTP requests for <code>/robots.txt</code>. For all such requests, we’ll check whether the zone has opted in to use Cloudflare’s managed robots.txt by reading a value from our <a href="https://blog.cloudflare.com/introducing-quicksilver-configuration-distribution-at-internet-scale/"><u>distributed key-value store</u></a>. If they have, the module then responds with the Cloudflare’s managed robots.txt directives, prepended to the origin’s robot.txt if there is an existing file. We prepend so we can add a generalized header that instructs all bots on the customers preferences for data use, as defined in the <a href="https://www.ietf.org/archive/id/draft-it-aipref-attachment-00.html#name-introduction"><u>IETF AI preferences proposal</u></a>. Note that in robots.txt, the <a href="https://datatracker.ietf.org/doc/html/rfc9309#section-2.2.2"><u>most specific match</u></a> <i>must</i> always be used, and since our disallow expressions are scoped to cover everything, we can ensure a directive we prepend will never conflict with a more targeted customer directive. If the customer has <i>not</i> enabled this feature, the request is forwarded to the origin server as usual, using whatever the customer has written in their own robots.txt file. (While caching origin's robots.txt could reduce latency by eliminating a round trip to the origin, the impact on overall page load times would be minimal, as robots.txt requests comprise a small fraction of total traffic. Adding cache update/invalidation would introduce complexity with limited benefit, so we prioritized functionality and reliability in our implementation.)</p>
    <div>
      <h2>Step 2: block, but only where you show ads</h2>
      <a href="#step-2-block-but-only-where-you-show-ads">
        
      </a>
    </div>
    <p>Adding an entry to your robots.txt file is the first step to telling AI bots not to crawl you. But robots.txt is an honor system. Nothing forces bots to follow it. That’s why we introduced our <a href="https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/"><u>one-click managed rule</u></a> to block all AI bots across your zone. However, some customers want AI bots to visit certain pages, like developer or support documentation. For customers who are hesitant to block everywhere, we have a brand-new option: let us detect when ads are shown on a hostname, and we will block AI bots ONLY on that hostname. Here’s how we do it.</p><p>First, we use multiple techniques to identify if a request is coming from an AI bot. The easiest technique is to identify well-behaved crawlers that publicly declare their user agent, and use dedicated IP ranges. Often we work directly with these bot makers to add them to our <a href="https://radar.cloudflare.com/traffic/verified-bots"><u>Verified Bot list</u></a>.</p><p>Many bot operators act in good faith by publicly publishing their user agents, or even <a href="https://blog.cloudflare.com/verified-bots-with-cryptography/"><u>cryptographically verifying their bot requests</u></a> directly with Cloudflare. Unfortunately, some attempt to appear like a real browser by using a spoofed user agent. It's not new for our global machine learning models to recognize this activity as a bot, even when operators lie about their user agent. When bad actors attempt to crawl websites at scale, they generally use tools and frameworks that we’re able to fingerprint, and we use Cloudflare’s network of over 57 million requests per second on average, to understand how much we should trust the fingerprint. We compute global aggregates across many signals, and based on these signals, our models are able to consistently and <a href="https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/"><u>appropriately flag traffic from evasive AI bots</u></a>.</p><p>When we see a request from an AI bot, our system checks if we have previously identified ads in the response served by the target page. To do this, we inspect the “response body” — the raw HTML code of the web page being sent back.  After parsing the HTML document, we perform a comprehensive scan for code patterns commonly found in <a href="https://support.google.com/adsense/answer/9183549?hl=en#:~:text=An%20ad%20unit%20is%20one,flexibility%20in%20terms%20of%20customization."><u>ad units</u></a>, which signals to us that the page is serving an ad. Examples of such code would be:</p>
            <pre><code>&lt;div class="ui-advert" data-role="advert-unit" data-testid="advert-unit" data-ad-format="takeover" data-type="" data-label="" style=""&gt;
&lt;script&gt;
....
&lt;/script&gt;
&lt;/div&gt;</code></pre>
            <p>Here, the div-container has the <code>ui-advert</code> class commonly used for advertising. Similarly, links to commonly used ad servers like Google Syndication are a good signal as well, such as the following:</p>
            <pre><code>&lt;link rel="dns-prefetch" href="https://pagead2.googlesyndication.com/"&gt;

&lt;script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-1234567890123456" crossorigin="anonymous"&gt;&lt;/script&gt;</code></pre>
            <p>By streaming and directly parsing small chunks of the response using our ultra-fast <a href="https://blog.cloudflare.com/html-parsing-2/#lol-html"><u>LOL HTML parser</u></a>, we can perform scans without adding any latency to the inspected response.</p><p>So as not to reinvent the wheel, we are adopting techniques similar to those that ad blockers have been using for years. Ad blockers fundamentally perform two separate tasks to block advertisements in a browser. The first is to block the browser from fetching resources from ad servers, and the second is to suppress displaying HTML elements that contain ads. For this, ad blockers rely on large filter lists such as <a href="https://easylist.to/index.html"><u>EasyList</u></a> that contain both so-called URL block filters that match outgoing request URLs against a set of patterns, and block them if they match one of the filters, and CSS selectors that are designed to match HTML ad elements.</p><p>We can use both of these techniques to detect if an HTML response contains ads by checking external resources (e.g. content referenced by HREF or SCRIPT tags) against URL block filters, and the HTML elements themselves against CSS selectors. Because we do not actually need to block every single advertisement on a site, but rather detect the overall presence of ads on a site, we can achieve the same detection efficacy when shrinking the number of CSS and URL filters down from more than 40,000 in EasyList to the 400 most commonly seen ones to increase our computational efficiency.</p><p>Because some sites load ads dynamically rather than directly in the returned HTML (partially to avoid ad blocking), we enrich this first information source with data from <a href="https://developers.cloudflare.com/fundamentals/reference/policies-compliances/content-security-policies/"><u>Content Security Policy (CSP)</u></a> reports. The Content Security Policy standard is a security mechanism that helps web developers control the resources (like scripts, stylesheets, and images) a browser is allowed to load for a specific web page, and browsers send reports about loaded resources to a CSP management system, which for many sites is Cloudflare’s <a href="https://developers.cloudflare.com/page-shield/"><u>Page Shield</u></a> product. These reports allow us to relate scripts loaded from ad servers directly with page URLs. Both of these information sources are consumed by our <a href="https://www.cloudflare.com/en-gb/learning/security/glossary/what-is-endpoint/"><u>endpoint management service</u></a>, which then matches incoming requests against hostnames that we already know are serving ads.</p><p>We do all of this on every request for any customer who opts in, even free customers. </p><p>To enable this feature, simply navigate to the <a href="https://dash.cloudflare.com/?to=/:account/:zone/security/bots/configure"><u>Security &gt; Settings &gt; Bots</u></a> section of the Cloudflare dashboard, and choose either <code>Block on pages with Ads</code> or <code>Block Everywhere</code>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/yoGKnsD7fuG9K8MysCMHl/91fb4bb69625d8c85a8dcf4cfb21f6de/unnamed__1_.png" />
          </figure>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/64xCpJrlgY1WtsNI0CeeT5/975e6a329b605e11445faafa038181aa/unnamed__2_.png" />
          </figure>
    <div>
      <h2>The AI bot hunt: finding and identifying bots</h2>
      <a href="#the-ai-bot-hunt-finding-and-identifying-bots">
        
      </a>
    </div>
    <p>The AI bot landscape has exploded and continues to grow with an exponential trajectory as more and more operators come online. At Cloudflare, our team of security researchers are constantly identifying and classifying different AI-related crawlers and scrapers across our network. </p><p>There are two major ways in which we track AI bots and identify those that are poorly behaved:</p><p>1. Our customers play a crucial role by directly submitting reports of misbehaved AI bots that may not yet be classified by Cloudflare. (If you have an AI bot that comes to mind here, we’d love for you to let us know through our <a href="https://docs.google.com/forms/d/14bX0RJH_0w17_cAUiihff5b3WLKzfieDO4upRlo5wj8/"><u>bots submission form</u></a> today.) Once such a bot comes to our attention, our security analysts investigate to determine how it should be categorized.</p><p>2. We’re able to derive insights through analysis of the massive scale of our customers’ traffic that we observe. Specifically, we can see which AI agents visit which websites and when, drawing out trends or patterns that might make a website owner want to disallow a given AI bot. This bird’s-eye view on abusive AI bot behavior was paramount as we started to determine the content of a managed robots.txt.</p>
    <div>
      <h2>What’s next?</h2>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>Our new <a href="https://developers.cloudflare.com/bots/additional-configurations/managed-robots-txt/"><u>managed robots.txt</u></a> and blocking AI bots on pages with ads features are available to <i>all Cloudflare customers</i>, including everyone on a Free plan. We encourage customers to start using them today – to take control over how the content on your website gets used. Looking ahead, Cloudflare will monitor the <a href="https://ietf-wg-aipref.github.io/drafts/draft-ietf-aipref-vocab.html"><u>IETF’s pending proposal</u></a> allowing website publishers to control how automated systems use their content and update our managed robots.txt accordingly. We will also continue to provide more granular control around AI bot management and investigate new distinguishing signals as AI bots become more and more precise. And if you’ve seen suspicious behavior from an AI scraper, contribute to the Internet ecosystem by <a href="https://docs.google.com/forms/d/14bX0RJH_0w17_cAUiihff5b3WLKzfieDO4upRlo5wj8/"><u>letting us know</u></a>!</p> ]]></content:encoded>
            <category><![CDATA[Pay Per Crawl]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[Impact]]></category>
            <guid isPermaLink="false">44HBJInoaQRMqVRmSaqjg6</guid>
            <dc:creator>Jin-Hee Lee</dc:creator>
            <dc:creator>Dipunj Gupta</dc:creator>
            <dc:creator>Brian Mitchell</dc:creator>
            <dc:creator>Reid Tatoris</dc:creator>
            <dc:creator>Henry Clausen</dc:creator>
        </item>
        <item>
            <title><![CDATA[Introducing pay per crawl: Enabling content owners to charge AI crawlers for access]]></title>
            <link>https://blog.cloudflare.com/introducing-pay-per-crawl/</link>
            <pubDate>Tue, 01 Jul 2025 10:00:00 GMT</pubDate>
            <description><![CDATA[ Pay per crawl is a new feature to allow content creators to charge AI crawlers for access to their content.  ]]></description>
            <content:encoded><![CDATA[ 
    <div>
      <h2>A changing landscape of consumption </h2>
      <a href="#a-changing-landscape-of-consumption">
        
      </a>
    </div>
    <p>Many publishers, content creators and website owners currently feel like they have a binary choice — either leave the front door wide open for AI to consume everything they create, or create their own walled garden. But what if there was another way?</p><p>At Cloudflare, we started from a simple principle: we wanted content creators to have control over who accesses their work. If a creator wants to <a href="https://www.cloudflare.com/learning/ai/how-to-block-ai-crawlers/">block all AI crawlers</a> from their content, they should be able to do so. If a creator wants to allow some or all AI crawlers full access to their content for free, they should be able to do that, too. Creators should be in the driver’s seat.</p><p>After hundreds of conversations with news organizations, publishers, and large-scale social media platforms, we heard a consistent desire for a third path: They’d like to allow AI crawlers to access their content, but they’d like to get compensated. Currently, that requires knowing the right individual and striking a one-off deal, which is an insurmountable challenge if you don’t have scale and leverage. </p>
    <div>
      <h2>What if I could charge a crawler? </h2>
      <a href="#what-if-i-could-charge-a-crawler">
        
      </a>
    </div>
    <p>We believe your choice need not be binary — there should be a third, more nuanced option: <b>You can charge for access.</b> Instead of a blanket block or uncompensated open access, we want to empower content owners to monetize their content at Internet scale.</p><p>We’re excited to help dust off a mostly forgotten piece of the web: <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/402"><b><u>HTTP response code 402</u></b></a>.</p>
    <div>
      <h2>Introducing pay per crawl</h2>
      <a href="#introducing-pay-per-crawl">
        
      </a>
    </div>
    <p><a href="http://www.cloudflare.com/paypercrawl-signup/">Pay per crawl</a>, in private beta, is our first experiment in this area. </p><p>Pay per crawl integrates with existing web infrastructure, leveraging <a href="https://www.cloudflare.com/learning/ddos/glossary/hypertext-transfer-protocol-http/">HTTP status codes</a> and established authentication mechanisms to create a framework for paid content access. </p><p>Each time an AI crawler requests content, they either present payment intent via request headers for successful access (<a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/200"><u>HTTP response code 200</u></a>), or receive a <code>402 Payment Required</code> response with pricing. Cloudflare acts as the Merchant of Record for pay per crawl and also provides the underlying technical infrastructure.</p>
    <div>
      <h3>Publisher controls and pricing</h3>
      <a href="#publisher-controls-and-pricing">
        
      </a>
    </div>
    <p>Pay per crawl grants domain owners full control over their monetization strategy. They can define a flat, per-request price across their entire site. Publishers will then have three distinct options for a crawler:</p><ul><li><p><b>Allow:</b> Grant the crawler free access to content.</p></li><li><p><b>Charge:</b> Require payment at the configured, domain-wide price.</p></li><li><p><b>Block:</b> Deny access entirely, with no option to pay.</p></li></ul>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2PhxxI7f3Teb521mPRFQUL/1ecfd01f60f165b35c27ab9457f8b152/image3.png" />
          </figure><p>An important mechanism here is that even if a crawler doesn’t have a billing relationship with Cloudflare, and thus couldn’t be charged for access, a publisher can still choose to ‘charge’ them. This is the functional equivalent of a network level block (an HTTP <code>403 Forbidden</code> response where no content is returned) — but with the added benefit of telling the crawler there could be a relationship in the future. </p><p>While publishers currently can define a flat price across their entire site, they retain the flexibility to bypass charges for specific crawlers as needed. This is particularly helpful if you want to allow a certain crawler through for free, or if you want to negotiate and execute a content partnership outside the pay per crawl feature. </p><p>To ensure integration with each publisher’s existing security posture, Cloudflare enforces Allow or Charge decisions via a rules engine that operates only after existing WAF policies and <a href="https://www.cloudflare.com/learning/bots/what-is-bot-management/">bot management</a> or bot blocking features have been applied.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3NI9GUkR8RmmApQyOgb1mI/4f77c199ccdc5ebc166204cdaec72c48/image2.png" />
          </figure>
    <div>
      <h3>Payment headers and access</h3>
      <a href="#payment-headers-and-access">
        
      </a>
    </div>
    <p>As we were building the system, we knew we had to solve an incredibly important technical challenge: ensuring we could charge a specific crawler, but prevent anyone from spoofing that crawler. Thankfully, there’s a way to do this using <a href="https://developers.cloudflare.com/bots/concepts/bot/verified-bots/web-bot-auth/"><u>Web Bot Auth</u></a> proposals.</p><p>For crawlers, <a href="https://blog.cloudflare.com/web-bot-auth/"><u>this involves:</u></a></p><ul><li><p>Generating an Ed25519 key pair, and making the <a href="https://datatracker.ietf.org/doc/html/rfc7517"><u>JWK</u></a>-formatted public key available in a hosted directory</p></li><li><p>Registering with Cloudflare to provide the URL of your key directory and user agent information.</p></li><li><p>Configuring your crawler to use <a href="https://datatracker.ietf.org/doc/rfc9421/"><u>HTTP Message Signatures</u></a> with each request.</p></li></ul><p>Once registration is accepted, crawler requests should always include <code>signature-agent</code>, <code>signature-input</code>, and <code>signature</code> headers to identify your crawler and discover paid resources.</p>
            <pre><code>GET /example.html
Signature-Agent: "https://signature-agent.example.com"
Signature-Input: sig2=("@authority" "signature-agent")
 ;created=1735689600
 ;keyid="poqkLGiymh_W0uP6PZFw-dvez3QJT5SolqXBCW38r0U"
 ;alg="ed25519"
 ;expires=1735693200
;nonce="e8N7S2MFd/qrd6T2R3tdfAuuANngKI7LFtKYI/vowzk4lAZYadIX6wW25MwG7DCT9RUKAJ0qVkU0mEeLElW1qg=="
 ;tag="web-bot-auth"
Signature: sig2=:jdq0SqOwHdyHr9+r5jw3iYZH6aNGKijYp/EstF4RQTQdi5N5YYKrD+mCT1HA1nZDsi6nJKuHxUi/5Syp3rLWBA==:</code></pre>
            
    <div>
      <h3>Accessing paid content</h3>
      <a href="#accessing-paid-content">
        
      </a>
    </div>
    <p>Once a crawler is set up, determination of whether content requires payment can happen via two flows:</p>
    <div>
      <h4>Reactive (discovery-first)</h4>
      <a href="#reactive-discovery-first">
        
      </a>
    </div>
    <p>Should a crawler request a paid URL, Cloudflare returns an <code>HTTP 402 Payment Required</code> response, accompanied by a <code>crawler-price</code> header. This signals that payment is required for the requested resource.</p>
            <pre><code>HTTP 402 Payment Required
crawler-price: USD XX.XX</code></pre>
            <p> The crawler can then decide to retry the request, this time including a <code>crawler-exact-price</code> header to indicate agreement to pay the configured price.</p>
            <pre><code>GET /example.html
crawler-exact-price: USD XX.XX </code></pre>
            
    <div>
      <h4>Proactive (intent-first)</h4>
      <a href="#proactive-intent-first">
        
      </a>
    </div>
    <p>Alternatively, a crawler can preemptively include a <code>crawler-max-price</code> header in its initial request.</p>
            <pre><code>GET /example.html
crawler-max-price: USD XX.XX</code></pre>
            <p>If the price configured for a resource is equal to or below this specified limit, the request proceeds, and the content is served with a successful <code>HTTP 200 OK</code> response, confirming the charge:</p>
            <pre><code>HTTP 200 OK
crawler-charged: USD XX.XX 
server: cloudflare</code></pre>
            <p>If the amount in a <code>crawler-max-price</code> request is greater than the content owner’s configured price, only the configured price is charged. However, if the resource’s configured price exceeds the maximum price offered by the crawler, an <code>HTTP</code><code><b> </b></code><code>402 Payment Required</code> response is returned, indicating the specified cost.  Only a single price declaration header, <code>crawler-exact-price</code> or <code>crawler-max-price</code>, may be used per request.</p><p>The <code>crawler-exact-price</code> or <code>crawler-max-price</code> headers explicitly declare the crawler's willingness to pay. If all checks pass, the content is served, and the crawl event is logged. If any aspect of the request is invalid, the edge returns an <code>HTTP 402 Payment Required</code> response.</p>
    <div>
      <h3>Financial settlement</h3>
      <a href="#financial-settlement">
        
      </a>
    </div>
    <p>Crawler operators and content owners must configure pay per crawl payment details in their Cloudflare account. Billing events are recorded each time a crawler makes an authenticated request with payment intent and receives an HTTP 200-level response with a <code>crawler-charged</code> header. Cloudflare then aggregates all the events, charges the crawler, and distributes the earnings to the publisher.</p>
    <div>
      <h2>Content for crawlers today, agents tomorrow </h2>
      <a href="#content-for-crawlers-today-agents-tomorrow">
        
      </a>
    </div>
    <p>At its core, pay per crawl begins a technical shift in how content is controlled online. By providing creators with a robust, programmatic mechanism for valuing and controlling their digital assets, we empower them to continue creating the rich, diverse content that makes the Internet invaluable. </p><p>We expect pay per crawl to evolve significantly. It’s very early: we believe many different types of interactions and marketplaces can and should develop simultaneously. We are excited to support these various efforts and open standards.</p><p>For example, a publisher or new organization might want to charge different rates for different paths or content types. How do you introduce dynamic pricing based not only upon demand, but also how many users your AI application has? How do you introduce granular licenses at internet scale, whether for training, <a href="https://www.cloudflare.com/learning/ai/inference-vs-training/">inference</a>, search, or something entirely new?</p><p>The true potential of pay per crawl may emerge in an <a href="https://www.cloudflare.com/learning/ai/what-is-agentic-ai/">agentic</a> world. What if an agentic paywall could operate entirely programmatically? Imagine asking your favorite deep research program to help you synthesize the latest cancer research or a legal brief, or just help you find the best restaurant in Soho — and then giving that agent a budget to spend to acquire the best and most relevant content. By anchoring our first solution on <b>HTTP response code 402</b>, we enable a future where intelligent agents can programmatically negotiate access to digital resources. </p>
    <div>
      <h2>Getting started</h2>
      <a href="#getting-started">
        
      </a>
    </div>
    <p>Pay per crawl is currently in private beta. We’d love to hear from you if you’re either a crawler interested in paying to access content or a content creator interested in charging for access. You can reach out to us at <a href="http://www.cloudflare.com/paypercrawl-signup/"><u>http://www.cloudflare.com/paypercrawl-signup/</u></a> or contact your Account Executive if you’re an existing Enterprise customer.</p> ]]></content:encoded>
            <category><![CDATA[Pay Per Crawl]]></category>
            <category><![CDATA[AI Bots]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Bot Management]]></category>
            <guid isPermaLink="false">7AJ8tUOFDvk5mCTrDjBPDq</guid>
            <dc:creator>Will Allen</dc:creator>
            <dc:creator>Simon Newton</dc:creator>
        </item>
        <item>
            <title><![CDATA[From Googlebot to GPTBot: who’s crawling your site in 2025]]></title>
            <link>https://blog.cloudflare.com/from-googlebot-to-gptbot-whos-crawling-your-site-in-2025/</link>
            <pubDate>Tue, 01 Jul 2025 10:00:00 GMT</pubDate>
            <description><![CDATA[ From May 2024 to May 2025, crawler traffic rose 18%, with GPTBot growing 305% and Googlebot 96%. ]]></description>
            <content:encoded><![CDATA[ <p><a href="https://www.cloudflare.com/learning/bots/what-is-a-web-crawler/"><u>Web crawlers</u></a> are not new. The <a href="https://en.wikipedia.org/wiki/World_Wide_Web_Wanderer"><u>World Wide Web Wanderer</u></a> debuted in 1993, though the first web search engines to truly use crawlers and indexers were <a href="https://en.wikipedia.org/wiki/JumpStation"><u>JumpStation</u></a> and <a href="https://en.wikipedia.org/wiki/WebCrawler"><u>WebCrawler</u></a>. Crawlers are part of one of the backbones of the Internet’s success: search. Their main purpose has been to index the content of websites across the Internet so that those websites can appear in search engine results and direct users appropriately. In this blog post, we’re analyzing recent trends in web crawling, which now has a crucial and complex new role with the rise of AI.</p><p>Not all crawlers are the same. Bots, automated scripts that perform tasks across the Internet, come in many forms: those considered non-threatening or “<a href="https://www.cloudflare.com/learning/bots/how-to-manage-good-bots/"><u>good</u></a>” (such as API clients, search indexing bots like Googlebot, or health checkers) and those considered malicious or “<a href="https://www.cloudflare.com/learning/bots/how-to-manage-good-bots/"><u>bad</u></a>” (like those used for credential stuffing, spam, or <a href="https://www.cloudflare.com/learning/ai/how-to-prevent-web-scraping/">scraping content without permission</a>). In fact, around 30% of global web traffic today, according to <a href="https://radar.cloudflare.com/traffic?dateRange=52w#bot-vs-human"><u>Cloudflare Radar data</u></a>, comes from bots, and even exceeds human Internet traffic in some locations.</p><p>A new category, AI crawlers, has emerged in recent years. These bots collect data from across the web to train AI models, improving tools and experiences, but also <a href="https://en.wikipedia.org/wiki/Artificial_intelligence_and_copyright"><u>raising issues around content rights</u></a>, unauthorized use, and infrastructure overload. We aimed to confirm the growth of both search and AI crawlers, examine specific AI crawlers, and understand broader crawler usage.</p><p>This is increasingly relevant with the rapid adoption of AI, growing content rights concerns, and data privacy discussions. Some sites and creators are looking to <a href="https://www.cloudflare.com/learning/ai/how-to-block-ai-crawlers/">limit or block AI crawlers</a> using tools like <code>robots.txt</code> or <a href="https://blog.cloudflare.com/bringing-ai-to-cloudflare/#enabling-dynamic-updates-for-the-ai-bot-rule"><u>firewall rules</u></a>. Others, like Dutch indie maker and entrepreneur <a href="https://x.com/levelsio/status/1916626339924267319"><u>Pieter Levels</u></a>, have embraced them: “<i>I’m 100% fine with AI crawlers… very important to rank in LLMs [large language models]</i>”.</p><p>It’s important to note that crawlers serve different purposes. For example, the <code>facebookexternalhit</code> bot is not included in this analysis, as it is used by Facebook to fetch page content when generating previews for shared links. However, within this post, we are only focusing on AI and search crawlers that are indexing or scraping website content.</p>
    <div>
      <h2>AI-only crawlers perspective</h2>
      <a href="#ai-only-crawlers-perspective">
        
      </a>
    </div>
    <p>Let’s start with an AI-only crawler perspective that we currently have on <a href="https://radar.cloudflare.com/explorer?dataSet=ai.bots&amp;dt=12w"><u>Cloudflare Radar</u></a>, focused only on crawlers advertised as AI-related. To identify them, we’re using here a <a href="https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.json"><u>list</u></a> derived from an open-source project that helps website owners manage and control access to AI crawlers — especially those used to train large language models (LLMs). It also provides guidance on what to include in <code>robots.txt</code><i> </i>files (more on that below). The data shown below is based on matching those crawler names with user-agent strings in HTTP requests. (Further details, including one exception, about this method can be found at the end of the blog post.)</p><p>The AI crawler landscape saw a significant shift between May 2024 and May 2025, with <code>GPTBot</code> (from OpenAI) emerging as the dominant force, surging from 5% to 30% share, and <code>Meta-ExternalAgent</code> (from Meta) making a strong new entry at 19%. This growth came at the expense of former leader <code>Bytespider</code>, which plummeted from 42% to 7%, as well as other AI crawlers like <code>ClaudeBot</code> and <code>Amazonbot</code>, which also saw declines. Our data clearly indicates a reordering of top AI crawlers, highlighting the increasing prominence of OpenAI and Meta in this category.</p><p><b>May 2024</b></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3W6ZVHbwe8r5R5pYrZE7Aw/20a6ef0f77c015ae932848861c04b556/image6.png" />
          </figure><p><b>May 2025</b></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5joaVYfpzHZe7K8VEfCZCV/729f22a39f51d54b80cae35dd38e42b4/image3.png" />
          </figure><table><tr><td><p><b>Rank</b></p></td><td><p><b>Bot Name</b></p></td><td><p><b>Share (May 2024)</b></p></td><td><p><b>Rank</b></p></td><td><p><b>Bot Name</b></p></td><td><p><b>Share (May 2025)</b></p></td></tr><tr><td><p>1</p></td><td><p>Bytespider</p></td><td><p>42%</p></td><td><p>1</p></td><td><p>GPTBot</p></td><td><p>30%</p></td></tr><tr><td><p>2</p></td><td><p>ClaudeBot</p></td><td><p>27%</p></td><td><p>2</p></td><td><p>ClaudeBot</p></td><td><p>21%</p></td></tr><tr><td><p>3</p></td><td><p>Amazonbot</p></td><td><p>21%</p></td><td><p>3</p></td><td><p>Meta-ExternalAgent</p></td><td><p>19%</p></td></tr><tr><td><p>4</p></td><td><p>GPTBot</p></td><td><p>5%</p></td><td><p>4</p></td><td><p>Amazonbot</p></td><td><p>11%</p></td></tr><tr><td><p>5</p></td><td><p>Applebot</p></td><td><p>4.1%</p></td><td><p>5</p></td><td><p>Bytespider</p></td><td><p>7.2%</p></td></tr></table><p>For additional context, the list below includes further information about the bots with higher crawling shares seen above. This information comes from the same open-source <a href="https://github.com/ai-robots-txt/ai.robots.txt/blob/main/robots.json"><u>list</u></a> mentioned above and from publications by companies like <a href="https://platform.openai.com/docs/bots"><u>OpenAI</u></a>, which explain how their crawlers are used. </p><ul><li><p><b>GPTBot</b> – OpenAI’s crawler used to improve and train large language models like ChatGPT.</p></li><li><p><b>ClaudeBot</b> – Anthropic’s crawler for training and updating the Claude AI assistant.</p></li><li><p><b>Meta-ExternalAgent</b> – Meta’s bot likely used for collecting data to train or fine-tune LLMs.</p></li><li><p><b>Amazonbot</b> – Amazon’s crawler that gathers data for its search and AI applications.</p></li><li><p><b>Bytespider</b> – ByteDance’s AI data collector, often linked to training models like Ernie or TikTok-related AI.</p></li><li><p><b>Applebot</b> – Apple’s web crawler primarily for Siri and Spotlight search, possibly used in AI development.</p></li><li><p><b>OAI-SearchBot</b> – OpenAI’s search-focused crawler, likely used for retrieving real-time web info for models.</p></li><li><p><b>ChatGPT-User</b> – Represents API-based or browser usage of ChatGPT in connection with user interactions.</p></li><li><p><b>PerplexityBot</b> – Crawler from Perplexity.ai, which powers their AI answer engine using real-time web data.</p></li></ul><p>Webmasters can inform crawler operators of whether they want these bots and crawlers to access their content by setting out rules in a file called <a href="https://www.cloudflare.com/learning/bots/what-is-robots-txt/"><code><u>robots.txt</u></code></a>, which tells crawlers what pages they should or shouldn’t access. <a href="https://blog.cloudflare.com/ai-audit-enforcing-robots-txt/"><u>As we’ve seen recently</u></a>, crawlers honoring your <code>robots.txt</code> policies is voluntary, but Cloudflare announced tools like <a href="https://blog.cloudflare.com/cloudflare-ai-audit-control-ai-content-crawlers/"><u>AI Audit</u></a> to help content creators to enforce it.</p><p>Now, as we’ve seen, the landscape of web crawling is evolving rapidly, driven by the merging roles of search engines and AI. AI is now deeply integrated into search, seen in Google’s AI Overviews and AI Mode, but also in social media platforms, like Meta AI on Instagram. So, let's broaden our analysis to include these wider AI-driven crawling activities.</p>
    <div>
      <h2>General AI and search crawling growth: +18%</h2>
      <a href="#general-ai-and-search-crawling-growth-18">
        
      </a>
    </div>
    <p>A broader view reveals the growth of crawling traffic from both search and AI crawlers over the first few months of 2025. To remove customer growth bias, we'll analyze trends using a fixed set of customers from specific weeks (a method we’ve used in our <a href="http://radar.cloudflare.com/year-in-review/"><u>Cloudflare Radar Year in Review</u></a>): the first week of May 2024, a week in November 2024, and the first week of April 2025. </p><p>Using that method, we found that AI and search crawler traffic grew by 18% from May 2024 to May 2025 (comparing full-month periods). The increase was even higher, at 48%, when including new Cloudflare customers added during that time. Peak AI and search crawling traffic occurred in April 2025, with a 32% increase compared to May 2024. This confirms that crawling traffic has clearly risen over the past year, but also that growth is not always constant. Google remains the dominant player, and its share is growing too, as we’ll see in the next section.</p><p>As the next chart shows, crawling traffic increased sharply in March and April 2025 and remained high, though slightly lower, in May.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/hePknXM0crXK4jX5e7LxZ/0956ac5024915734a9c0f20c8f15bc16/image4.png" />
          </figure><p>The patterns on the above crawling chart also seem to reflect broader seasonal patterns and general human Internet traffic patterns. In 2024, traffic dropped during the summer in the Northern Hemisphere, with August and September being the least active months. And like overall Internet traffic, it then rose in November, when people are typically more online due to shopping and seasonal habits, as we've seen in <a href="https://blog.cloudflare.com/from-deals-to-ddos-exploring-cyber-week-2024-internet-trends/"><u>past analyses</u></a>. </p>
    <div>
      <h2>Googlebot crawling grew 96% in one year</h2>
      <a href="#googlebot-crawling-grew-96-in-one-year">
        
      </a>
    </div>
    <p><a href="https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers"><code><u>Googlebot</u></code></a>, which indexes content for Google Search, was clearly the top crawler throughout the period and showed strong growth, up 96% from May 2024 to May 2025, reflecting increased crawling by Google. Crawling traffic peaked in April 2025, reaching 145% higher than in May 2024. It's also important to mention that Google made changes to its search and launched <a href="https://ahrefs.com/blog/google-ai-overviews/"><u>AI Overviews</u></a> in its search engine during this time — first in the US in May 2024, then in more countries later.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1qFVGagpgYIti7p741j8uW/77dc4bc61bec86faa6b80b293997dffd/image1.png" />
          </figure><p>Two trends stand out when looking at daily data for Google-related crawlers, as shown in the graph below. First, <a href="https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers"><code><u>Googlebot</u></code></a> and the more recent <code>GoogleOther</code> (a <a href="https://searchengineland.com/google-launches-new-googlebot-named-googleother-395827"><u>web crawler from 2023</u></a> for “research and development”) account for most of Google’s crawling activity. Second, there were two visible drops in crawling traffic: one on December 14, 2024 (around a Google Search <a href="https://status.search.google.com/incidents/V9nDKuo6nWKh2ThBALgA#:~:text=Incident%20began%20at%202024%2D12,Time"><u>update</u></a>), and another from May 20 to May 28, 2025. That May 20 drop occurred around the same time as the rollout of AI Mode on Google Search in the US, although the timing may be coincidental.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/16kB3kDeprY3LMetEDPS10/8f2bafc7568579377624d6c0aaeb1751/image5.png" />
          </figure>
    <div>
      <h2>Breakdown of top 20 AI and search web crawlers </h2>
      <a href="#breakdown-of-top-20-ai-and-search-web-crawlers">
        
      </a>
    </div>
    <p>Ranking crawlers by their share of total requests gives a clearer picture of which bots are gaining or losing ground, especially among those focused on search and AI. The table below shows a clear trend: some AI bots have grown rapidly since last year (with growth beginning even earlier), while many traditional search crawlers have remained flat or lost share (as in the case of Bing and its <code>Bingbot</code> crawler). The main exception is <code>Googlebot</code>.</p><p>The next table shows the percentage share of each crawler out of all crawling traffic generated by this specific cohort of over 30 AI &amp; search crawlers observed by Cloudflare in May 2024 and May 2025. The table below also includes the change in percentage points and the growth or decline in raw request volume. Crawlers are ranked by their share in May 2025. Key crawler shifts include <code>GPTBot</code> rising sharply (+305%), while <code>Bytespider</code> dropped dramatically (-85%).</p>
<div><table><thead>
  <tr>
    <th><span>Rank</span></th>
    <th><span>Bot name</span></th>
    <th><span>Share May 2024</span></th>
    <th><span>Share May 2025</span></th>
    <th><span>Δ percentage-point change</span></th>
    <th><span>Raw requests growth (May 2024 to May 2025)</span></th>
  </tr>
</thead>
<tbody>
  <tr>
    <td><span>1</span></td>
    <td><span>Googlebot</span></td>
    <td><span>30%</span></td>
    <td><span>50%</span></td>
    <td><span>+20 pp</span></td>
    <td><span>96%</span></td>
  </tr>
  <tr>
    <td><span>2</span></td>
    <td><span>Bingbot</span></td>
    <td><span>10%</span></td>
    <td><span>8.7%</span></td>
    <td><span>-1.3 pp</span></td>
    <td><span>2%</span></td>
  </tr>
  <tr>
    <td><span>3</span></td>
    <td><span>GPTBot</span></td>
    <td><span>2.2%</span></td>
    <td><span>7.7%</span></td>
    <td><span>+5.5 pp</span></td>
    <td><span>305%</span></td>
  </tr>
  <tr>
    <td><span>4</span></td>
    <td><span>ClaudeBot</span></td>
    <td><span>11.7%</span></td>
    <td><span>5.4%</span></td>
    <td><span>-6.3 pp</span></td>
    <td><span>-46%</span></td>
  </tr>
  <tr>
    <td><span>5</span></td>
    <td><span>GoogleOther</span></td>
    <td><span>4.4%</span></td>
    <td><span>4.3%</span></td>
    <td><span>-0.1 pp</span></td>
    <td><span>14%</span></td>
  </tr>
  <tr>
    <td><span>6</span></td>
    <td><span>Amazonbot</span></td>
    <td><span>7.6%</span></td>
    <td><span>4.2%</span></td>
    <td><span>-3.4 pp</span></td>
    <td><span>-35%</span></td>
  </tr>
  <tr>
    <td><span>7</span></td>
    <td><span>Googlebot-Image</span></td>
    <td><span>4.5%</span></td>
    <td><span>3.3%</span></td>
    <td><span>-1.2 pp</span></td>
    <td><span>-13%</span></td>
  </tr>
  <tr>
    <td><span>8</span></td>
    <td><span>Bytespider</span></td>
    <td><span>22.8%</span></td>
    <td><span>2.9%</span></td>
    <td><span>-19.8 pp</span></td>
    <td><span>-85%</span></td>
  </tr>
  <tr>
    <td><span>9</span></td>
    <td><span>Yandex</span></td>
    <td><span>2.8%</span></td>
    <td><span>2.2%</span></td>
    <td><span>-0.7 pp</span></td>
    <td><span>-10%</span></td>
  </tr>
  <tr>
    <td><span>10</span></td>
    <td><span>ChatGPT-User</span></td>
    <td><span>0.1%</span></td>
    <td><span>1.3%</span></td>
    <td><span>+1.2 pp</span></td>
    <td><span>2,825%</span></td>
  </tr>
  <tr>
    <td><span>11</span></td>
    <td><span>Applebot</span></td>
    <td><span>1.9%</span></td>
    <td><span>1.2%</span></td>
    <td><span>-0.7 pp</span></td>
    <td><span>-26%</span></td>
  </tr>
  <tr>
    <td><span>12</span></td>
    <td><span>Timpibot</span></td>
    <td><span>0.3%</span></td>
    <td><span>0.6%</span></td>
    <td><span>+0.3 pp</span></td>
    <td><span>133%</span></td>
  </tr>
  <tr>
    <td><span>13</span></td>
    <td><span>Baiduspider</span></td>
    <td><span>0.5%</span></td>
    <td><span>0.4%</span></td>
    <td><span>-0.1 pp</span></td>
    <td><span>7%</span></td>
  </tr>
  <tr>
    <td><span>14</span></td>
    <td><span>PerplexityBot</span></td>
    <td><span>&lt;0.01%</span></td>
    <td><span>0.2%</span></td>
    <td><span>+0.2 pp</span></td>
    <td><span>157,490%</span></td>
  </tr>
  <tr>
    <td><span>15</span></td>
    <td><span>DuckDuckBot</span></td>
    <td><span>0.2%</span></td>
    <td><span>0.1%</span></td>
    <td><span>-0.1 pp</span></td>
    <td><span>-16%</span></td>
  </tr>
  <tr>
    <td><span>16</span></td>
    <td><span>SeznamBot</span></td>
    <td><span>0.1%</span></td>
    <td><span>0.1%</span></td>
    <td></td>
    <td><span>2%</span></td>
  </tr>
  <tr>
    <td><span>17</span></td>
    <td><span>Yeti</span></td>
    <td><span>0.1%</span></td>
    <td><span>0.1%</span></td>
    <td></td>
    <td><span>47%</span></td>
  </tr>
  <tr>
    <td><span>18</span></td>
    <td><span>coccocbot</span></td>
    <td><span>0.1%</span></td>
    <td><span>0.1%</span></td>
    <td></td>
    <td><span>-3%</span></td>
  </tr>
  <tr>
    <td><span>19</span></td>
    <td><span>Sogou</span></td>
    <td><span>0.1%</span></td>
    <td><span>0.1%</span></td>
    <td></td>
    <td><span>-22%</span></td>
  </tr>
  <tr>
    <td><span>20</span></td>
    <td><span>Yahoo! Slurp</span></td>
    <td><span>0.1%</span></td>
    <td><span>0.0%</span></td>
    <td><span>-0.1 pp</span></td>
    <td><span>-8%</span></td>
  </tr>
</tbody></table></div><p>Based on this data, two major shifts in web crawling occurred between May 2024 and May 2025:</p><p><b>1. Some AI crawlers rose sharply.
</b><code>GPTBot</code> (from OpenAI) increased its share from 2.2% to 7.7% (+5.5 pp), with a 305% rise in requests. This underscores the data demand for training large language models like ChatGPT. <code>GPTBot</code> jumped from #9 in May 2024 to #3 in May 2025.</p><p>Another OpenAI crawler, <code>ChatGPT-User</code>, saw requests surge by 2,825%, reaching a 1.3% share. This reflects a large rise in ChatGPT user activity or API-based interactions that involve accessing web content. <code>PerplexityBot</code> (from Perplexity.ai), despite a small 0.2% share, recorded the highest growth rate: a staggering 157,490% increase in raw requests.</p><p>Meanwhile, some AI crawlers saw steep declines. <code>ClaudeBot</code> (Anthropic) fell from 11.7% to 5.4% of total traffic and dropped 46% in requests. <code>Bytespider</code> plummeted 85% in request volume, falling from #2 to #8 in crawler share (now at just 2.9%).</p><p>Both <code>Amazonbot</code> and <code>Applebot</code>, also considered AI crawlers, saw decreases in share and in raw requests (–35% and –26%, respectively).</p><p><b>2. Google’s dominance expanded.
</b><code>Googlebot</code>’s share rose from 30% to 50%, supporting search indexing, but potentially also having AI-related purposes (such as new AI Overviews in Google Search). And <code>GoogleOther</code> (the<a href="https://searchengineland.com/google-launches-new-googlebot-named-googleother-395827"><u> crawler introduced in 2023</u></a>) also increased in crawling traffic, 14%. Other Google crawlers not in the top 20, like <code>Googlebot-News</code>, also grew significantly (+71% in requests). There’s a clear trend of growth in these Google-related web crawlers at a time when the company is investing heavily in combining AI with search.</p><p>Also in the search category, <code>Bingbot</code>’s share (from Microsoft) declined slightly from 10% to 8.7% (-1.3 pp), though its raw requests still grew modestly by 2%.</p><p>These trends show that web crawling is increasingly dominated by bots from Google and OpenAI, reflecting clear shifts over the course of a year. Google also appears to be adapting how it collects data to support both traditional search and AI-driven features.</p><p>Also worth noting is <code>FriendlyCrawler</code>, which no longer appears in the top 20 list as of May 2025 (now ranked #35). It was #14 in May 2024 with a 0.2% share, but saw a 100% drop in requests by May 2025. This bot is known to index and analyze website content, although its owner and <a href="https://imho.alex-kunz.com/2024/01/25/an-update-on-friendly-crawler/"><u>purpose</u></a> remain unclear. Typically, crawlers like this are used for improving search results, market research, or analytics.</p>
    <div>
      <h2>robots.txt &amp; AI bots: GPTBot leads twice</h2>
      <a href="#robots-txt-ai-bots-gptbot-leads-twice">
        
      </a>
    </div>
    <p>Recent data from June 6, 2025, from <a href="https://radar.cloudflare.com/ai-insights?dateStart=2025-05-30&amp;dateEnd=2025-06-06"><u>Cloudflare Radar</u></a> shows that out of 3,816 domains (from the <a href="https://radar.cloudflare.com/domains"><u>top 10,000</u></a>) where we were able to find a<i> robots.txt</i> file, 546 (about 14%) had “allow” or “disallow” (fully or partially) directives targeting AI bots in particular.</p><p>This leaves many site owners in a gray area because it’s not always clear how effective <i>robots.txt</i> is in managing AI crawlers. Some site owners may not think to use it specifically for AI bots, while others might be unsure whether these bots even respect <i>robots.txt </i>rules, especially newer or less transparent crawlers. In other cases, sites use partial rules to fine-tune access, trying to balance visibility and protection without fully opting in or out.</p><p>The “disallow” rules appear far more often than “allow” rules. The most frequently blocked bot was <code>GPTBot</code>, disallowed by 312 domains (250 fully, 62 partially), followed by <code>CCBot</code> and <code>Google-Extended</code>, as shown in the following graph.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6CgnH5GZNCIgUAZEeMWTVK/fe608135d5376e936f0ac503e3e9564c/image2.png" />
          </figure><p>Although <code>GPTBot</code> was the most blocked, it was also the most explicitly allowed, with 61 domains granting access (18 fully, 43 partially). Still, very few sites openly and explicitly allow AI bots, and when they do, it’s usually for limited sections. Note that bots not listed in a site’s robots.txt are effectively allowed by default.</p><p>As AI crawling increases, more websites are moving from passive signals like <i>robots.txt</i> to active protections like <a href="https://www.cloudflare.com/learning/ddos/glossary/web-application-firewall-waf/"><u>Web Application Firewalls</u></a>. The ecosystem is shifting, with a growing focus on enforceable controls.</p><p><i>Note: When we analyze crawler traffic, we compare user-agent tokens found in robots.txt files (like those for AI crawlers) with the actual user-agent strings in HTTP requests. It's important to note that some robots.txt tokens, such as Google-Extended, aren't user-agent substrings. As described in </i><a href="https://www.rfc-editor.org/rfc/rfc9309.html#name-the-user-agent-line"><i><u>RFC 9309</u></i></a><i>, one goal of these token may be to signal the purpose of the crawler. For instance, Google uses Google-Extended in robots.txt to see if your content can be used for AI training, but the traffic itself still comes from standard Google user-agents like Googlebot. Because of this, not every robots.txt entry will have a direct match in HTTP request logs.</i></p>
    <div>
      <h2>Conclusion</h2>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>As AI crawlers reshape the Internet, websites face both new challenges and new opportunities in managing their online presence.</p><p>This analysis highlights the growing impact of AI on web crawling, showing a clear shift from traditional search indexing to data collection for training AI models. The detailed statistics, such as Googlebot’s continued growth and the rapid rise of AI-specific crawlers, offer context for understanding how this space is evolving and what it means for the future of web content access.</p><p>The trend toward stronger, enforceable blocking methods, something <a href="https://blog.cloudflare.com/cloudflare-ai-audit-control-ai-content-crawlers/"><u>Cloudflare has also been invested</u></a>, signals a key shift in how websites may control their interactions with AI systems going forward.</p> ]]></content:encoded>
            <category><![CDATA[Pay Per Crawl]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Radar]]></category>
            <category><![CDATA[AI Bots]]></category>
            <category><![CDATA[Bots]]></category>
            <guid isPermaLink="false">7KJiiS1zdIyBiVgoT6SgKf</guid>
            <dc:creator>João Tomé</dc:creator>
            <dc:creator>Jorge Pacheco</dc:creator>
            <dc:creator>Carlos Azevedo</dc:creator>
        </item>
        <item>
            <title><![CDATA[Message Signatures are now part of our Verified Bots Program, simplifying bot authentication]]></title>
            <link>https://blog.cloudflare.com/verified-bots-with-cryptography/</link>
            <pubDate>Tue, 01 Jul 2025 10:00:00 GMT</pubDate>
            <description><![CDATA[ Bots can start authenticating to Cloudflare using public key cryptography, preventing them from being spoofed and allowing origins to have confidence in their identity. ]]></description>
            <content:encoded><![CDATA[ <p>As a site owner, how do you know which bots to allow on your site, and which you’d like to block? Existing identification methods rely on a combination of IP address range (which may be shared by other services, or change over time) and user-agent header (easily spoofable). These have limitations and deficiencies. In our <a href="https://blog.cloudflare.com/web-bot-auth/"><u>last blog post</u></a>, we proposed using HTTP Message Signatures: a way for developers of bots, agents, and crawlers to clearly identify themselves by cryptographically signing requests originating from their service. </p><p>Since we published the blog post on Message Signatures and the <a href="https://datatracker.ietf.org/doc/html/draft-meunier-web-bot-auth-architecture"><u>IETF draft for Web Bot Auth</u></a> in May 2025, we’ve seen significant interest around implementing and deploying Message Signatures at scale. It’s clear that well-intentioned bot owners want a clear way to identify their bots to site owners, and site owners want a clear way to identify and manage bot traffic. Both parties seem to agree that deploying cryptography for the purposes of authentication is the right solution.     </p><p>Today, we’re announcing that we’re integrating HTTP Message Signatures directly into our <b>Verified Bots Program</b>. This announcement has two main parts: (1) for bots, crawlers, and agents, we’re simplifying enrollment into the Verified Bots program for those who sign requests using Message Signatures, and (2) we’re encouraging <i>all bot operators moving forward </i>to use Message Signatures over existing verification mechanisms. Because Verified Bots are considered authenticated, they do not face challenges from our Bot Management to identify as bots, given they’re already identified as such.</p><p>For site owners, no additional action is required – Cloudflare will automatically validate signatures on our edge, and if that validation is a success, that traffic will be marked as verified so that site owners can use the <a href="https://developers.cloudflare.com/bots/concepts/bot/verified-bots/categories/"><u>verified bot fields</u></a> to create Bot Management and <a href="https://developers.cloudflare.com/waf/custom-rules/"><u>WAF rules</u></a> based on it.  </p><p>This isn't just about simplifying things for bot operators — it’s about giving website owners unparalleled accuracy in identifying trusted bot traffic, cutting down on the overhead for cryptographic verification, and fundamentally transforming how we manage authentication across the Cloudflare network.</p>
    <div>
      <h2>Become a Verified Bot with Message Signatures</h2>
      <a href="#become-a-verified-bot-with-message-signatures">
        
      </a>
    </div>
    <p>Cloudflare’s existing <a href="https://developers.cloudflare.com/bots/concepts/bot/verified-bots/"><u>Verified Bots program</u></a> is for bots that are transparent about who they are and what they do, like indexing sites for search or scanning for security vulnerabilities. You can see a list of these verified bots in <a href="https://radar.cloudflare.com/bots#verified-bots"><u>Cloudflare Radar</u></a>:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2lMYno3QOwtwfTDDgeqFx8/c69088229dcf9fc08f5a76ce7e0a0354/1.png" />
          </figure><p><sup><i>A preview of the Verified Bots page on Cloudflare Radar. </i></sup></p><p>In the past, in order to <a href="https://dash.cloudflare.com/?to=/:account/configurations/verified-bots"><u>apply</u></a> to be a verified bot, we used to ask for IP address ranges or reverse DNS names so that we could verify your identity. This required some manual steps like checking that the IP address range is valid and is associated with the appropriate <a href="https://www.cloudflare.com/learning/network-layer/what-is-an-autonomous-system/"><u>ASN</u></a>. </p><p>With the integration of Message Signatures, we’re aiming to streamline applications into our Verified Bot program. Bots applying with well-formed Message Signatures will be prioritized, and approved more quickly! </p>
    <div>
      <h2>Getting started</h2>
      <a href="#getting-started">
        
      </a>
    </div>
    <p>In order to make generating Message Signatures as easy as possible, Cloudflare is providing two open source libraries: a <a href="https://crates.io/crates/web-bot-auth"><u>web-bot-auth library in rust</u></a>, and a <a href="https://www.npmjs.com/package/web-bot-auth"><u>web-bot-auth npm package in TypeScript</u></a>. If you’re working on a different implementation, <a href="https://www.cloudflare.com/lp/verified-bots/"><u>let us know</u></a> – we’d love to add it to our <a href="https://developers.cloudflare.com/bots/concepts/bot/verified-bots/web-bot-auth/"><u>developer docs</u></a>!</p><p>At a high level, signing your requests with web bot auth consists of the following steps: </p><ul><li><p>Generate a valid signing key. See <a href="https://developers.cloudflare.com/bots/concepts/bot/verified-bots/web-bot-auth/#1-generate-a-valid-signing-key"><u>Signing Key section</u></a> for step-by-step instructions.</p></li><li><p>Host a JSON web key set containing your public key under <code>/.well-known/http-message-signature-directory</code> of your website.</p></li><li><p>Sign responses for that URL using a Web Bot Auth library, one signature for each key contained in it, to prove you own it. See the <a href="https://developers.cloudflare.com/bots/concepts/bot/verified-bots/web-bot-auth/#2-host-a-key-directory"><u>Hosting section</u></a> for step-by-step instructions.</p></li><li><p>Register that URL with us, using our Verified Bots form. This can be done directly in your Cloudflare account. See <a href="https://developers.cloudflare.com/bots/concepts/bot/verified-bots/overview/"><u>our documentation</u></a>.</p></li><li><p>Sign requests using a Web Bot Auth library. </p></li></ul><p>
As an example, <a href="https://radar.cloudflare.com/scan"><u>Cloudflare Radar's URL Scanner</u></a> lets you scan any URL and get a publicly shareable report with security, performance, technology, and network information. Here’s an example of what a well-formed signature looks like for requests coming from URL Scanner:</p>
            <pre><code>GET /path/to/resource HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36
Signature-Agent: "https://web-bot-auth-directory.radar-cfdata-org.workers.dev"
Signature-Input: sig=("@authority" "signature-agent");\
             	 created=1700000000;\
             	 expires=1700011111;\
             	 keyid="poqkLGiymh_W0uP6PZFw-dvez3QJT5SolqXBCW38r0U";\
             	 tag="web-bot-auth"
Signature:sig=jdq0SqOwHdyHr9+r5jw3iYZH6aNGKijYp/EstF4RQTQdi5N5YYKrD+mCT1HA1nZDsi6nJKuHxUi/5Syp3rLWBA==:</code></pre>
            <p>Since we’ve already registered URLScanner as a Verified Bot, Cloudflare will now automatically verify that the signature in the <code>Signature</code> header matches the request — more on that later.</p>
    <div>
      <h2>Register your bot</h2>
      <a href="#register-your-bot">
        
      </a>
    </div>
    <p>Access the <a href="https://dash.cloudflare.com/?to=/:account/configurations/verified-bots"><u>Verified Bots submission form</u></a> on your account. If that link does not immediately take you there, go to <i>your Cloudflare account</i> →  <i>Account Home</i>  → <i>the three dots next to your account name</i>  → <i>Configurations</i> → <i>Verified Bots.</i></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/73yQcvLmiVDe19HJXYvBIc/ca2bdb2bb81addc29583568087c2ccc2/3.png" />
          </figure><p>If you do not have a Cloudflare account, you can <a href="https://dash.cloudflare.com/sign-up"><u>sign up for a free one</u></a>.</p><p>For the verification method, select "Request Signature", then enter the URL of your key directory in Validation Instructions. Specifying the User-Agent values is optional if you’re submitting a Request Signature bot. </p><p>Once your application has gone through our (now shortened) review process, you don’t need to take any further action.</p>
    <div>
      <h2>Message Signature verification for origins</h2>
      <a href="#message-signature-verification-for-origins">
        
      </a>
    </div>
    <p>Starting today, Cloudflare is ramping up verification of <a href="https://datatracker.ietf.org/doc/html/draft-meunier-web-bot-auth-architecture"><u>cryptographic signatures provided by automated crawlers and bots</u></a>. This is currently available for all Free and Pro plans, and as we continue to test and validate at scale, will be released to all Business and Enterprise plans. This means that as time passes, the number of unauthenticated web crawlers should diminish, ensuring most bot traffic is authenticated before it reaches your website’s servers, helping to prevent spoofing attacks. </p><p>At a high level, signature verification works like this: </p><ol><li><p>A bot or agent sends a request to a website behind Cloudflare.</p></li><li><p>Cloudflare’s Message Signature verification service checks for the <code>Signature</code>, <code>Signature-Input</code>, and <code>Signature-Agent</code> headers.</p></li><li><p>It checks that the incoming request presents a <code>keyid</code> parameter in your Signature-Input that points to a key we already know.</p></li><li><p>It looks at the <code>expires</code> parameter in the incoming bot request. If the current time is after expiration, verification fails. This guards against replay attacks, preventing malicious agents from trying to pass as a bot by retrying messages they captured in the past.</p></li><li><p>It checks that you’ve specified a <code>tag</code> parameter indicating <code>web-bot-auth</code>, to indicate your intent that the message be handled using web bot authentication specifically</p></li><li><p>It looks at all the <a href="https://www.rfc-editor.org/rfc/rfc9421#covered-components"><u>components</u></a> chosen in your <code>Signature-Input</code> header, and constructs <a href="https://www.rfc-editor.org/rfc/rfc9421#name-creating-the-signature-base"><u>a signature base</u></a> from it. </p></li><li><p>If all pre-flight checks pass, Cloudflare attempts to verify the signature base against the value in Signature field using an <a href="https://www.rfc-editor.org/rfc/rfc9421#name-eddsa-using-curve-edwards25"><u>ed25519 verification algorithm</u></a> and the key supplied in <code>keyid</code>.</p></li><li><p>Verified Bots and other systems at Cloudflare use a successful verification as proof of your identity, and apply rules corresponding to that identity. </p></li></ol><p>If any of the above steps fail, Cloudflare falls back to existing bot identification and mitigation mechanisms. As the system matures, we would strengthen these requirements, and limit the possibilities of a soft downgrade.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/128Ox15wBqBPVKUUzvn4gA/acca9b9e6df243b8317b8964285ce57c/2.png" />
          </figure><p>As a site owner, you can segment your Verified Bot traffic by its type and purpose by adding the <a href="https://developers.cloudflare.com/bots/concepts/bot/verified-bots/categories/"><u>Verified Bot Categories</u></a> field <code>cf.verified_bot_category</code> as a filter criterion in <a href="https://developers.cloudflare.com/waf/custom-rules/"><u>WAF Custom rules</u></a>, <a href="https://developers.cloudflare.com/waf/rate-limiting-rules/"><u>Advanced Rate Limiting</u></a>, and Late <a href="https://developers.cloudflare.com/rules/transform/"><u>Transform rules</u></a>. For instance, to allow the Bibliothèque nationale de France and the Library of Congress, and institutions dedicated to academic research, you can add a rule that allows bots in the <code>Academic Research</code> category.</p>
    <div>
      <h2>Where we’re going next</h2>
      <a href="#where-were-going-next">
        
      </a>
    </div>
    <p>HTTP Message Signatures is a primitive that is useful beyond Cloudflare – the IETF standardized it as part of <a href="https://datatracker.ietf.org/doc/html/rfc9421"><u>RFC 9421</u></a>.</p><p>As discussed in our <a href="https://blog.cloudflare.com/web-bot-auth/#introducing-http-message-signatures"><u>previous blog post</u></a>, Cloudflare believes that making Message Signatures a core component of bot authentication on the web should follow the same path. The <a href="https://www.ietf.org/archive/id/draft-meunier-web-bot-auth-architecture-02.html"><u>specifications</u></a> for the protocol are being built in the open, and they have already evolved following feedback.</p><p>Moreover, due to widespread interest, the IETF is considering forming a working group around <a href="https://datatracker.ietf.org/wg/webbotauth/about/"><u>Web Bot Auth</u></a>. Should you be a crawler, an origin, or even a CDN, we invite you to provide feedback to ensure the solution gets stronger, and suits your needs.</p>
    <div>
      <h2>A better, more trusted Internet</h2>
      <a href="#a-better-more-trusted-internet">
        
      </a>
    </div>
    <p>For bot, agent, and crawler operators that act transparently and provide vital services for the Internet, we’re providing a faster and more automated path to being recognized as a Verified Bot, reducing manual processes. We trust that this approach improves bot authentication from what were formerly brittle and unreliable authentication methods, to a secure and reliable alternative. It should reduce the overall volume of friction and hurdles genuinely useful bots face.</p><p>For site owners, Message Signatures provides better assurance that the bot traffic is legitimate — automatically recognized and allowed, minimizing disruption to essential services (e.g., search engine indexing, monitoring). In line with our commitments to making TLS/<a href="https://blog.cloudflare.com/introducing-universal-ssl/"><u>SSL</u></a> and <a href="https://blog.cloudflare.com/pt-br/post-quantum-zero-trust/"><u>Post-Quantum</u></a> certificates available for everyone, we’ll always offer the cryptographic verification of Message Signatures for all sites because we believe in a safer and more efficient Internet by fostering a trusted environment for both human and automated traffic.</p><p>If you have a feature request, feedback, or are interested in partnering with us, please <a href="https://www.cloudflare.com/lp/verified-bots/"><u>reach out</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[Pay Per Crawl]]></category>
            <category><![CDATA[Research]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[Bot Management]]></category>
            <category><![CDATA[AI Bots]]></category>
            <category><![CDATA[Cryptography]]></category>
            <guid isPermaLink="false">5K5btgE8vXWGaGxCrs5yFH</guid>
            <dc:creator>Mari Galicer</dc:creator>
            <dc:creator>Akshat Mahajan</dc:creator>
            <dc:creator>Gauri Baraskar</dc:creator>
            <dc:creator>Helen Du</dc:creator>
        </item>
        <item>
            <title><![CDATA[Forget IPs: using cryptography to verify bot and agent traffic]]></title>
            <link>https://blog.cloudflare.com/web-bot-auth/</link>
            <pubDate>Thu, 15 May 2025 13:00:00 GMT</pubDate>
            <description><![CDATA[ Bots now browse like humans. We're proposing bots use cryptographic signatures so that website owners can verify their identity. Explanations and demonstration code can be found within the post. ]]></description>
            <content:encoded><![CDATA[ <p></p><p>With the rise of traffic from <a href="https://www.cloudflare.com/learning/ai/what-is-agentic-ai/">AI agents</a>, what’s considered a bot is no longer clear-cut. There are some clearly malicious bots, like ones that DoS your site or do <a href="https://www.cloudflare.com/learning/bots/what-is-credential-stuffing/">credential stuffing</a>, and ones that most site owners do want to interact with their site, like the bot that indexes your site for a search engine, or ones that fetch RSS feeds.      </p><p>Historically, Cloudflare has relied on two main signals to verify legitimate web crawlers from other types of automated traffic: user agent headers and IP addresses. The <a href="https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/User-Agent"><code><u>User-Agent</u></code><u> header</u></a> allows bot developers to identify themselves, i.e. <code>MyBotCrawler/1.1</code>. However, user agent headers alone are easily spoofed and are therefore insufficient for reliable identification. To address this, user agent checks are often supplemented with <a href="https://developers.cloudflare.com/bots/concepts/bot/verified-bots/policy/#ip-validation"><u>IP address validation</u></a>, the inspection of published IP address ranges to confirm a crawler's authenticity. However, the logic around IP address ranges representing a product or group of users is brittle – connections from the crawling service might be shared by multiple users, such as in the case of <a href="https://blog.cloudflare.com/icloud-private-relay/"><u>privacy proxies</u></a> and VPNs, and these ranges, often maintained by cloud providers, change over time.</p><p>Cloudflare will always try to block malicious bots, but we think our role here is to also provide an affirmative mechanism to authenticate desirable bot traffic. By using well-established cryptography techniques, we’re proposing a better mechanism for legitimate agents and bots to declare who they are, and provide a clearer signal for site owners to decide what traffic to permit. </p><p><b>Today, we’re introducing two proposals – HTTP message signatures and request mTLS – for </b><a href="https://blog.cloudflare.com/friendly-bots/"><b><u>friendly bots</u></b></a><b> to authenticate themselves, and for customer origins to identify them. </b>In this blog post, we’ll share how these authentication mechanisms work, how we implemented them, and how you can participate in our closed beta.</p>
    <div>
      <h2>Existing bot verification mechanisms are broken </h2>
      <a href="#existing-bot-verification-mechanisms-are-broken">
        
      </a>
    </div>
    <p>Historically, if you’ve worked on ChatGPT, Claude, Gemini, or any other agent, you’ve had several options to identify your HTTP traffic to other services: </p><ol><li><p>You define a <a href="https://www.rfc-editor.org/rfc/rfc9110#name-user-agent"><u>user agent</u></a>, an HTTP header described in <a href="https://www.rfc-editor.org/rfc/rfc9110.html#name-user-agent"><u>RFC 9110</u></a>. The problem here is that this header is easily spoofable and there’s not a clear way for agents to identify themselves as semi-automated browsers — agents often use the Chrome user agent for this very reason, which is discouraged. The RFC <a href="https://www.rfc-editor.org/rfc/rfc9110.html#section-10.1.5-9"><u>states</u></a>: 
<i>“If a user agent masquerades as a different user agent, recipients can assume that the user intentionally desires to see responses tailored for that identified user agent, even if they might not work as well for the actual user agent being used.” </i> </p></li><li><p>You publish your IP address range(s). This has limitations because the same IP address might be shared by multiple users or multiple services within the same company, or even by multiple companies when hosting infrastructure is shared (like <a href="https://www.cloudflare.com/developer-platform/products/workers/">Cloudflare Workers</a>, for example). In addition, IP addresses are prone to change as underlying infrastructure changes, leading services to use ad-hoc sharing mechanisms like <a href="https://www.cloudflare.com/ips-v4"><u>CIDR lists</u></a>. </p></li><li><p>You go to every website and share a secret, like a <a href="https://www.rfc-editor.org/rfc/rfc6750"><u>Bearer</u></a> token. This is impractical at scale because it requires developers to maintain separate tokens for each website their bot will visit.</p></li></ol><p>We can do better! Instead of these arduous methods, we’re proposing that developers of bots and agents cryptographically sign requests originating from their service. When protecting origins, <a href="https://www.cloudflare.com/learning/cdn/glossary/reverse-proxy/">reverse proxies</a> such as Cloudflare can then validate those signatures to confidently identify the request source on behalf of site owners, allowing them to take action as they see fit. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3yB6h6XcSWNQO5McRWIpL8/edf32f7938b01a4c8f5eedefee2b9328/image2.png" />
          </figure><p>A typical system has three actors:</p><ul><li><p>User: the entity that wants to perform some actions on the web. This may be a human, an automated program, or anything taking action to retrieve information from the web.</p></li><li><p>Agent: an orchestrated browser or software program. For example, Chrome on your computer, or OpenAI’s <a href="https://operator.chatgpt.com/"><u>Operator</u></a> with ChatGPT. Agents can interact with the web according to web standards (HTML rendering, JavaScript, subrequests, etc.).</p></li><li><p>Origin: the website hosting a resource. The user wants to access it through the browser. This is Cloudflare when your website is using our services, and it’s your own server(s) when exposed directly to the Internet.</p></li></ul><p>In the next section, we’ll dive into HTTP Message Signatures and request mTLS, two mechanisms a browser agent may implement to sign outgoing requests, with different levels of ease for an origin to adopt. </p>
    <div>
      <h2>Introducing HTTP Message Signatures</h2>
      <a href="#introducing-http-message-signatures">
        
      </a>
    </div>
    <p><a href="https://www.rfc-editor.org/rfc/rfc9421.html"><u>HTTP Message Signatures</u></a> is a standard that defines the cryptographic authentication of a request sender. It’s essentially a cryptographically sound way to say, “hey, it’s me!”. It’s not the only way that developers can sign requests from their infrastructure — for example, AWS has used <a href="https://docs.aws.amazon.com/AmazonS3/latest/API/sig-v4-authenticating-requests.html"><u>Signature v4</u></a>, and Stripe has a framework for <a href="https://docs.stripe.com/webhooks#verify-webhook-signatures-with-official-libraries"><u>authenticating webhooks</u></a> — but Message Signatures is a published standard, and the cleanest, most developer-friendly way to sign requests.  </p><p>We’re working closely with the wider industry to support these standards-based approaches. For example, OpenAI has started to sign their requests. In their own words:   </p><blockquote><p><i>"Ensuring the authenticity of Operator traffic is paramount. With HTTP Message Signatures (</i><a href="https://www.rfc-editor.org/rfc/rfc9421.html"><i><u>RFC 9421</u></i></a><i>), OpenAI signs all Operator requests so site owners can verify they genuinely originate from Operator and haven’t been tampered with” </i>– Eugenio, Engineer, OpenAI</p></blockquote><p>Without further delay, let’s dive in how HTTP Messages Signatures work to identify bot traffic.</p>
    <div>
      <h3>Scoping standards to bot authentication</h3>
      <a href="#scoping-standards-to-bot-authentication">
        
      </a>
    </div>
    <p>Generating a message signature works like this: before sending a request, the agent signs the target origin with a public key. When fetching <code>https://example.com/path/to/resource</code>, it signs <code>example.com</code>. This public key is known to the origin, either because the agent is well known, because it has previously registered, or any other method. Then, the agent writes a <b>Signature-Input</b> header with the following parameters:</p><ol><li><p>A validity window (<code>created</code> and <code>expires</code> timestamps)</p></li><li><p>A Key ID that uniquely identifies the key used in the signature. This is a <a href="https://www.rfc-editor.org/rfc/rfc7638.html"><u>JSON Web Key Thumbprint</u></a>.  </p></li><li><p>A tag that shows websites the signature’s purpose and validation method, i.e. <code>web-bot-auth</code> for bot authentication.</p></li></ol><p>In addition, the <code>Signature-Agent</code> header indicates where the origin can find the public keys the agent used when signing the request, such as in a directory hosted by <code>signer.example.com</code>. This header is part of the signed content as well.</p><p>Here’s an example:</p>
            <pre><code>GET /path/to/resource HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 Chrome/113.0.0 MyBotCrawler/1.1
Signature-Agent: signer.example.com
Signature-Input: sig=("@authority" "signature-agent");\
             	 created=1700000000;\
             	 expires=1700011111;\
             	 keyid="ba3e64==";\
             	 tag="web-bot-auth"
Signature: sig=abc==</code></pre>
            <p>For those building bots, <a href="https://datatracker.ietf.org/doc/draft-meunier-web-bot-auth-architecture/"><u>we propose</u></a> signing the authority of the target URI, i.e. www.example.com, and a way to retrieve the bot public key in the form of <a href="https://datatracker.ietf.org/doc/draft-meunier-http-message-signatures-directory/"><u>signature-agent</u></a>, if present, i.e. <a href="http://crawler.search.google.com"><u>crawler.search.google.com</u></a> for Google Search, <a href="http://operator.openai.com"><u>operator.openai.com</u></a> for OpenAI Operator, workers.dev for Cloudflare Workers.</p><p>The <code>User-Agent</code> from the example above indicates that the software making the request is Chrome, because it is an agent that uses an orchestrated Chrome to browse the web. You should note that <code>MyBotCrawler/1.1</code> is still present. The <code>User-Agent</code> header can actually contain multiple products, in decreasing order of importance. If our agent is making requests via Chrome, that’s the most important product and therefore comes first.</p><p>At Internet-level scale, these signatures may add a notable amount of overhead to request processing. However, with the right cryptographic suite, and compared to the cost of existing bot mitigation, both technical and social, this seems to be a straightforward tradeoff. This is a metric we will monitor closely, and report on as adoption grows.</p>
    <div>
      <h3>Generating request signatures</h3>
      <a href="#generating-request-signatures">
        
      </a>
    </div>
    <p>We’re making several examples for generating Message Signatures for bots and agents <a href="https://github.com/cloudflareresearch/web-bot-auth/"><u>available on Github</u></a> (though we encourage other implementations!), all of which are standards-compliant, to maximize interoperability. </p><p>Imagine you’re building an agent using a managed Chromium browser, and want to sign all outgoing requests. To achieve this, the <a href="https://github.com/w3c/webextensions"><u>webextensions standard</u></a> provides <a href="https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/WebExtensions/API/webRequest/onBeforeSendHeaders"><u>chrome.webRequest.onBeforeSendHeaders</u></a>, where you can modify HTTP headers before they are sent by the browser. The event is <a href="https://developer.chrome.com/docs/extensions/reference/api/webRequest#life_cycle_of_requests"><u>triggered</u></a> before sending any HTTP data, and when headers are available.</p><p>Here’s what that code would look like: </p>
            <pre><code>chrome.webRequest.onBeforeSendHeaders.addListener(
  function (details) {
	// Signature and header assignment logic goes here
      // &lt;CODE&gt;
  },
  { urls: ["&lt;all_urls&gt;"] },
  ["blocking", "requestHeaders"] // requires "installation_mode": "force_installed"
);</code></pre>
            <p>Cloudflare provides a <a href="https://www.npmjs.com/package/web-bot-auth"><u>web-bot-auth</u></a> helper package on npm that helps you generate request signatures with the correct parameters. <code>onBeforeSendHeaders</code> is a Chrome extension hook that needs to be implemented synchronously. To do so, we <code>import {signatureHeadersSync} from “web-bot-auth”</code>. Once the signature completes, both <code>Signature</code> and <code>Signature-Input</code> headers are assigned. The request flow can then continue.</p>
            <pre><code>const request = new URL(details.url);
const created = new Date();
const expired = new Date(created.getTime() + 300_000)


// Perform request signature
const headers = signatureHeadersSync(
  request,
  new Ed25519Signer(jwk),
  { created, expires }
);
// `headers` object now contains `Signature` and `Signature-Input` headers that can be used</code></pre>
            <p>This extension code is available on <a href="https://github.com/cloudflareresearch/web-bot-auth/"><u>GitHub</u></a>, alongside a  debugging server, deployed at <a href="https://http-message-signatures-example.research.cloudflare.com"><u>https://http-message-signatures-example.research.cloudflare.com</u></a>. </p>
    <div>
      <h3>Validating request signatures </h3>
      <a href="#validating-request-signatures">
        
      </a>
    </div>
    <p>Using our <a href="https://http-message-signatures-example.research.cloudflare.com"><u>debug server</u></a>, we can now inspect and validate our request signatures from the perspective of the website we’d be visiting. We should now see the Signature and Signature-Input headers:  </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/18P5OyGxu2fU0Dpyv70Gjz/d82d62355524ad1914deb41b601bcad2/image3.png" />
          </figure><p><sup><i>In this example, the homepage of the debugging server validates the signature from the RFC 9421 Ed25519 verifying key, which the extension uses for signing.</i></sup></p><p>The above demo and code walkthrough has been fully written in TypeScript: the verification website is on Cloudflare Workers, and the client is a Chrome browser extension. We are cognisant that this does not suit all clients and servers on the web. To demonstrate the proposal works in more environments, we have also implemented bot signature validation in Go with a <a href="https://github.com/cloudflareresearch/web-bot-auth/tree/main/examples/caddy-plugin"><u>plugin</u></a> for <a href="https://caddyserver.com/"><u>Caddy server</u></a>.</p>
    <div>
      <h2>Experimentation with request mTLS</h2>
      <a href="#experimentation-with-request-mtls">
        
      </a>
    </div>
    <p>HTTP is not the only way to convey signatures. For instance, one mechanism that has been used in the past to authenticate automated traffic against secured endpoints is <a href="https://www.cloudflare.com/learning/access-management/what-is-mutual-tls/"><u>mTLS</u></a>, the “mutual” presentation of <a href="https://www.cloudflare.com/application-services/products/ssl/">TLS certificates</a>. As described in our <a href="https://www.cloudflare.com/learning/access-management/what-is-mutual-tls/"><u>knowledge base</u></a>:</p><blockquote><p><i>Mutual TLS, or mTLS for short, is a method for</i><a href="https://www.cloudflare.com/learning/access-management/what-is-mutual-authentication/"><i> </i><i><u>mutual authentication</u></i></a><i>. mTLS ensures that the parties at each end of a network connection are who they claim to be by verifying that they both have the correct private</i><a href="https://www.cloudflare.com/learning/ssl/what-is-a-cryptographic-key/"><i> </i><i><u>key</u></i></a><i>. The information within their respective</i><a href="https://www.cloudflare.com/learning/ssl/what-is-an-ssl-certificate/"><i> </i><i><u>TLS certificates</u></i></a><i> provides additional verification.</i></p></blockquote><p>While mTLS seems like a good fit for bot authentication on the web, it has limitations. If a user is asked for authentication via the mTLS protocol but does not have a certificate to provide, they would get an inscrutable and unskippable error. Origin sites need a way to conditionally signal to clients that they accept or require mTLS authentication, so that only mTLS-enabled clients use it.</p>
    <div>
      <h3>A TLS flag for bot authentication</h3>
      <a href="#a-tls-flag-for-bot-authentication">
        
      </a>
    </div>
    <p>TLS flags are an efficient way to describe whether a feature, like mTLS, is supported by origin sites. Within the IETF, we have proposed a new TLS flag called <a href="https://datatracker.ietf.org/doc/draft-jhoyla-req-mtls-flag/"><code><u>req mTLS</u></code></a> to be sent by the client during the establishment of a connection that signals support for authentication via a client certificate. </p><p>This proposal leverages the <a href="https://www.ietf.org/archive/id/draft-ietf-tls-tlsflags-14.html"><u>tls-flags</u></a> proposal under discussion in the IETF. The TLS Flags draft allows clients and servers to send an array of one bit flags to each other, rather than creating a new extension (with its associated overhead) for each piece of information they want to share. This is one of the first uses of this extension, and we hope that by using it here we can help drive adoption.</p><p>When a client sends the <a href="https://datatracker.ietf.org/doc/draft-jhoyla-req-mtls-flag/"><code><u>req mTLS</u></code></a> flag to the server, they signal to the server that they are able to respond with a certificate if requested. The server can then safely request a certificate without risk of blocking ordinary user traffic, because ordinary users will never set this flag. </p><p>Let’s take a look at what an example of such a req mTLS would look like in <a href="https://www.wireshark.org/"><u>Wireshark</u></a>, a network protocol analyser. You can follow along in the packet capture <a href="https://github.com/cloudflareresearch/req-mtls/tree/main/assets/demonstration-capture.pcapng"><u>here</u></a>.</p>
            <pre><code>Extension: req mTLS (len=12)
	Type: req mTLS (65025)
	Length: 12
	Data: 0b0000000000000000000001</code></pre>
            <p>The extension number is 65025, or 0xfe01. This corresponds to an unassigned block of <a href="https://www.iana.org/assignments/tls-extensiontype-values/tls-extensiontype-values.xhtml#tls-extensiontype-values-1"><u>TLS extensions</u></a> that can be used to experiment with TLS Flags. Once the standard is adopted and published by the IETF, the number would be fixed. To use the <code>req mTLS</code> flag the client needs to set the 80<sup>th</sup> bit to true, so with our block length of 12 bytes, it should  contain the data 0b0000000000000000000001, which is the case here. The server then responds with a certificate request, and the request follows its course.</p>
    <div>
      <h3>Request mTLS in action</h3>
      <a href="#request-mtls-in-action">
        
      </a>
    </div>
    <p><i>Code for this section is available in GitHub under </i><a href="https://github.com/cloudflareresearch/req-mtls"><i><u>cloudflareresearch/req-mtls</u></i></a></p><p>Because mutual TLS is widely supported in TLS libraries already, the parts we need to introduce to the client and server are:</p><ol><li><p>Sending/parsing of TLS-flags</p></li><li><p>Specific support for the <code>req mTLS</code> flag</p></li></ol><p>To the best of our knowledge, there is no complete public implementation of either scheme. Using it for bot authentication may provide a motivation to do so.</p><p>Using <a href="https://github.com/cloudflare/go"><u>our experimental fork of Go</u></a>, a TLS client could support req mTLS as follows:</p>
            <pre><code>config := &amp;tls.Config{
    	TLSFlagsSupported:  []tls.TLSFlag{0x50},
    	RootCAs:       	rootPool,
    	Certificates:  	certs,
    	NextProtos:    	[]string{"h2"},
}
trans := http.Transport{TLSClientConfig: config, ForceAttemptHTTP2: true}</code></pre>
            <p>This example library allows you to configure Go to send <code>req mTLS 0x50</code> bytes in the <code>TLS Flags</code> extension. If you’d like to test your implementation out, you can prompt your client for certificates against <a href="http://req-mtls.research.cloudflare.com"><u>req-mtls.research.cloudflare.com</u></a> using the Cloudflare Research client <a href="https://github.com/cloudflareresearch/req-mtls"><u>cloudflareresearch/req-mtls</u></a>. For clients, once they set the TLS Flags associated with <code>req mTLS</code>, they are done. The code section taking care of normal mTLS will take over at that point, with no need to implement something new.</p>
    <div>
      <h2>Two approaches, one goal</h2>
      <a href="#two-approaches-one-goal">
        
      </a>
    </div>
    <p>We believe that developers of agents and bots should have a public, standard way to authenticate themselves to CDNs and website hosting platforms, regardless of the technology they use or provider they choose. At a high level, both HTTP Message Signatures and request mTLS achieve a similar goal: they allow the owner of a service to authentically identify themselves to a website. That’s why we’re participating in the standardizing effort for both of these protocols at the IETF, where many other authentication mechanisms we’ve discussed here — from TLS to OAuth Bearer tokens –— been developed by diverse sets of stakeholders and standardized as RFCs.   </p><p>Evaluating both proposals against each other, we’re prioritizing <a href="https://datatracker.ietf.org/doc/html/draft-meunier-web-bot-auth-architecture"><u>HTTP Message Signatures for Bots</u></a> because it relies on the previously adopted <a href="https://datatracker.ietf.org/doc/html/rfc9421"><u>RFC 9421</u></a> with several <a href="https://httpsig.org/"><u>reference implementations</u></a>, and works at the HTTP layer, making adoption simpler. <a href="https://datatracker.ietf.org/doc/draft-jhoyla-req-mtls-flag/"><u>request mTLS</u></a> may be a better fit for site owners with concerns about the additional bandwidth, but <a href="https://datatracker.ietf.org/doc/html/draft-ietf-tls-tlsflags"><u>TLS Flags</u></a> has fewer implementations, is still waiting for IETF adoption, and upgrading the TLS stack has proven to be more challenging than with HTTP. Both approaches share similar discovery and key management concerns, as highlighted in a <a href="https://datatracker.ietf.org/doc/draft-meunier-web-bot-auth-glossary/"><u>glossary</u></a> draft at the IETF. We’re actively exploring both options, and would love to <a href="https://www.cloudflare.com/lp/verified-bots/"><u>hear</u></a> from both site owners and bot developers about how you’re evaluating their respective tradeoffs.</p>
    <div>
      <h2>The bigger picture </h2>
      <a href="#the-bigger-picture">
        
      </a>
    </div>
    <p>In conclusion, we think request signatures and mTLS are promising mechanisms for bot owners and developers of AI agents to authenticate themselves in a tamper-proof manner, forging a path forward that doesn’t rely on ever-changing IP address ranges or spoofable headers such as <code>User-Agent</code>. This authentication can be consumed by Cloudflare when acting as a reverse proxy, or directly by site owners on their own infrastructure. This means that as a bot owner, you can now go to content creators and discuss crawling agreements, with as much granularity as the number of bots you have. You can start implementing these solutions today and test them against the research websites we’ve provided in this post.</p><p>Bot authentication also empowers site owners small and large to have more control over the traffic they allow, empowering them to continue to serve content on the public Internet while monitoring automated requests. Longer term, we will integrate these authentication mechanisms into our <a href="https://blog.cloudflare.com/cloudflare-ai-audit-control-ai-content-crawlers/"><u>AI Audit</u></a> and <a href="https://developers.cloudflare.com/bots/get-started/bot-management/"><u>Bot Management</u></a> products, to provide better visibility into the bots and agents that are willing to identify themselves.</p><p>Being able to solve problems for both origins and clients is key to helping build a better Internet, and we think identification of automated traffic is a step towards that. If you want us to start verifying your message signatures or client certificates, have a compelling use case you’d like us to consider, or any questions, please <a href="https://www.cloudflare.com/lp/verified-bots/"><u>reach out</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[Research]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[Bot Management]]></category>
            <category><![CDATA[AI Bots]]></category>
            <category><![CDATA[Cryptography]]></category>
            <guid isPermaLink="false">2hUP3FdePgIYVDwhgJVLeV</guid>
            <dc:creator>Thibault Meunier</dc:creator>
            <dc:creator>Mari Galicer</dc:creator>
        </item>
        <item>
            <title><![CDATA[Improved Bot Management flexibility and visibility with new high-precision heuristics]]></title>
            <link>https://blog.cloudflare.com/bots-heuristics/</link>
            <pubDate>Wed, 19 Mar 2025 13:00:00 GMT</pubDate>
            <description><![CDATA[ By building and integrating a new heuristics framework into the Cloudflare Ruleset Engine, we now have a more flexible system to write rules and deploy new releases rapidly. ]]></description>
            <content:encoded><![CDATA[ <p>Within the Cloudflare Application Security team, every <a href="https://www.cloudflare.com/learning/ai/what-is-machine-learning/"><u>machine learning</u></a> model we use is underpinned by a rich set of static rules that serve as a ground truth and a baseline comparison for how our models are performing. These are called heuristics. Our Bot Management heuristics engine has served as an important part of eight global <a href="https://developers.cloudflare.com/bots/concepts/bot-score/#machine-learning"><u>machine learning (ML) models</u></a>, but we needed a more expressive engine to increase our accuracy. In this post, we’ll review how we solved this by moving our heuristics to the Cloudflare <a href="https://developers.cloudflare.com/ruleset-engine/"><u>Ruleset Engine</u></a>. Not only did this provide the platform we needed to write more nuanced rules, it made our platform simpler and safer, and provided <a href="https://www.cloudflare.com/application-services/products/bot-management/"><u>Bot Management</u></a> customers more flexibility and visibility into their bot traffic.   </p>
    <div>
      <h3>Bot detection via simple heuristics</h3>
      <a href="#bot-detection-via-simple-heuristics">
        
      </a>
    </div>
    <p>In Cloudflare’s bot detection, we build heuristics from attributes like software library fingerprints, HTTP request characteristics, and internal threat intelligence. Heuristics serve three separate purposes for bot detection: </p><ol><li><p>Bot identification: If traffic matches a heuristic, we can identify the traffic as definitely automated traffic (with a <a href="https://developers.cloudflare.com/bots/concepts/bot-score/"><u>bot score</u></a> of 1) without the need of a machine learning model. </p></li><li><p>Train ML models: When traffic matches our heuristics, we create labelled datasets of bot traffic to train new models. We’ll use many different sources of labelled bot traffic to train a new model, but our heuristics datasets are one of the highest confidence datasets available to us.   </p></li><li><p>Validate models: We benchmark any new model candidate’s performance against our heuristic detections (among many other checks) to make sure it meets a required level of accuracy.</p></li></ol><p>While the existing heuristics engine has worked very well for us, as bots evolved we needed the flexibility to write increasingly complex rules. Unfortunately, such rules were not easily supported in the old engine. Customers have also been asking for more details about which specific heuristic caught a request, and for the flexibility to enforce different policies per heuristic ID.  We found that by building a new heuristics framework integrated into the Cloudflare Ruleset Engine, we could build a more flexible system to write rules and give Bot Management customers the granular explainability and control they were asking for. </p>
    <div>
      <h3>The need for more efficient, precise rules</h3>
      <a href="#the-need-for-more-efficient-precise-rules">
        
      </a>
    </div>
    <p>In our previous heuristics engine, we wrote rules in <a href="https://www.lua.org/"><u>Lua</u></a> as part of our <a href="https://openresty.org/"><u>openresty</u></a>-based reverse proxy. The Lua-based engine was limited to a very small number of characteristics in a rule because of the high engineering cost we observed with adding more complexity.</p><p>With Lua, we would write fairly simple logic to match on specific characteristics of a request (i.e. user agent). Creating new heuristics of an existing class was fairly straight forward. All we’d need to do is define another instance of the existing class in our database. However, if we observed malicious traffic that required more than two characteristics (as a simple example, user-agent and <a href="https://en.wikipedia.org/wiki/Autonomous_system_(Internet)"><u>ASN</u></a>) to identify, we’d need to create bespoke logic for detections. Because our Lua heuristics engine was bundled with the code that ran ML models and other important logic, all changes had to go through the same review and release process. If we identified malicious traffic that needed a new heuristic class, and we were also blocked by pending changes in the codebase, we’d be forced to either wait or rollback the changes. If we’re writing a new rule for an “under attack” scenario, every extra minute it takes to deploy a new rule can mean an unacceptable impact to our customer’s business. </p><p>More critical than time to deploy is the complexity that the heuristics engine supports. The old heuristics engine only supported using specific request attributes when creating a new rule. As bots became more sophisticated, we found we had to reject an increasing number of new heuristic candidates because we weren’t able to write precise enough rules. For example, we found a <a href="https://go.dev/"><u>Golang</u></a> TLS fingerprint frequently used by bots and by a small number of corporate VPNs. We couldn’t block the bots without also stopping the legitimate VPN usage as well, because the old heuristics platform lacked the flexibility to quickly compile sufficiently nuanced rules. Luckily, we already had the perfect solution with Cloudflare Ruleset Engine. </p>
    <div>
      <h3>Our new heuristics engine</h3>
      <a href="#our-new-heuristics-engine">
        
      </a>
    </div>
    <p>The Ruleset Engine is familiar to anyone who has written a WAF rule, Load Balancing rule, or Transform rule, <a href="https://blog.cloudflare.com/announcing-firewall-rules/"><u>just to name a few</u></a>. For Bot Management, the Wireshark-inspired syntax allows us to quickly write heuristics with much greater flexibility to vastly improve accuracy. We can write a rule in <a href="https://yaml.org/"><u>YAML</u></a> that includes arbitrary sub-conditions and inherit the same framework the WAF team uses to both ensure any new rule undergoes a rigorous testing process with the ability to rapidly release new rules to stop attacks in real-time. </p><p>Writing heuristics on the Cloudflare Ruleset Engine allows our engineers and analysts to write new rules in an easy to understand YAML syntax. This is critical to supporting a rapid response in under attack scenarios, especially as we support greater rule complexity. Here’s a simple rule using the new engine, to detect empty user-agents restricted to a specific JA4 fingerprint (right), compared to the empty user-agent detection in the old Lua based system (left): </p><table><tr><td><p><b>Old</b></p></td><td><p><b>New</b></p></td></tr><tr><td><p><code>local _M = {}</code></p><p><code>local EmptyUserAgentHeuristic = {</code></p><p><code>   heuristic = {},</code></p><p><code>}</code></p><p><code>EmptyUserAgentHeuristic.__index = EmptyUserAgentHeuristic</code></p><p><code>--- Creates and returns empty user agent heuristic</code></p><p><code>-- @param params table contains parameters injected into EmptyUserAgentHeuristic</code></p><p><code>-- @return EmptyUserAgentHeuristic table</code></p><p><code>function _M.new(params)</code></p><p><code>   return setmetatable(params, EmptyUserAgentHeuristic)</code></p><p><code>end</code></p><p><code>--- Adds heuristic to be used for inference in `detect` method</code></p><p><code>-- @param heuristic schema.Heuristic table</code></p><p><code>function EmptyUserAgentHeuristic:add(heuristic)</code></p><p><code>   self.heuristic = heuristic</code></p><p><code>end</code></p><p><code>--- Detect runs empty user agent heuristic detection</code></p><p><code>-- @param ctx context of request</code></p><p><code>-- @return schema.Heuristic table on successful detection or nil otherwise</code></p><p><code>function EmptyUserAgentHeuristic:detect(ctx)</code></p><p><code>   local ua = ctx.user_agent</code></p><p><code>   if not ua or ua == '' then</code></p><p><code>      return self.heuristic</code></p><p><code>   end</code></p><p><code>end</code></p><p><code>return _M</code></p></td><td><p><code>ref: empty-user-agent</code></p><p><code>      description: Empty or missing </code></p><p><code>User-Agent header</code></p><p><code>      action: add_bot_detection</code></p><p><code>      action_parameters:</code></p><p><code>        active_mode: false</code></p><p><code>      expression: http.user_agent eq </code></p><p><code>"" and cf.bot_management.ja4 = "t13d1516h2_8daaf6152771_b186095e22b6"</code></p></td></tr></table><p>The Golang heuristic that captured corporate proxy traffic as well (mentioned above) was one of the first to migrate to the new Ruleset engine. Before the migration, traffic matching on this heuristic had a false positive rate of 0.01%. While that sounds like a very small number, this means for every million bots we block, 100 real users saw a Cloudflare challenge page unnecessarily. At Cloudflare scale, even small issues can have real, negative impact.</p><p>When we analyzed the traffic caught by this heuristic rule in depth, we saw the vast majority of attack traffic came from a small number of abusive networks. After narrowing the definition of the heuristic to flag the Golang fingerprint only when it’s sourced by the abusive networks, the rule now has a false positive rate of 0.0001% (One out of 1 million).  Updating the heuristic to include the network context improved our accuracy, while still blocking millions of bots every week and giving us plenty of training data for our bot detection models. Because this heuristic is now more accurate, newer ML models make more accurate decisions on what’s a bot and what isn’t.</p>
    <div>
      <h3>New visibility and flexibility for Bot Management customers </h3>
      <a href="#new-visibility-and-flexibility-for-bot-management-customers">
        
      </a>
    </div>
    <p>While the new heuristics engine provides more accurate detections for all customers and a better experience for our analysts, moving to the Cloudflare Ruleset Engine also allows us to deliver new functionality for Enterprise Bot Management customers, specifically by offering more visibility. This new visibility is via a new field for Bot Management customers called Bot Detection IDs. Every heuristic we use includes a unique Bot Detection ID. These are visible to Bot Management customers in analytics, logs, and firewall events, and they can be used in the firewall to write precise rules for individual bots. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3cYXHw8tFUjdJQGm93gFcE/0a3f6ab89a70410ebb7dd2c6f4f3a677/1.png" />
          </figure>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/9d7L1r7yN9AEhO26H7SgQ/434f0f934dd4f4498a8d13e85a7660ae/2.png" />
          </figure><p>Detections also include a specific tag describing the class of heuristic. Customers see these plotted over time in their analytics. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2UlkGQ070UWsmU76IXYqDd/6bca8f28959a8fe8f3013792a17b348a/image4.png" />
          </figure><p>To illustrate how this data can help give customers visibility into why we blocked a request, here’s an example request flagged by Bot Management (with the IP address, ASN, and country changed):</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3i6k9ejsBVwlbXUrZFIFJG/4c9cddd11d9f7a8ddf10eb5ff30a027b/4.png" />
          </figure><p>Before, just seeing that our heuristics gave the request a score of 1 was not very helpful in understanding why it was flagged as a bot. Adding our Detection IDs to Firewall Events helps to paint a better picture for customers that we’ve identified this request as a bot because that traffic used an empty user-agent. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5UQd20VnXCnIav1skHiX6i/18e0cf01106601ae7faf18be50d43365/5.png" />
          </figure><p>In addition to Analytics and Firewall Events, Bot Detection IDs are now available for Bot Management customers to use in Custom Rules, Rate Limiting Rules, Transform Rules, and Workers. </p>
    <div>
      <h4>Account takeover detection IDs</h4>
      <a href="#account-takeover-detection-ids">
        
      </a>
    </div>
    <p>One way we’re focused on improving Bot Management for our customers is by surfacing more attack-specific detections. During Birthday Week, we <a href="https://blog.cloudflare.com/a-safer-internet-with-cloudflare/"><u>launched Leaked Credentials Check</u></a> for all customers so that security teams could help <a href="https://www.cloudflare.com/zero-trust/solutions/account-takeover-prevention/"><u>prevent account takeover (ATO) attacks</u></a> by identifying accounts at risk due to leaked credentials. We’ve now added two more detections that can help Bot Management enterprise customers identify suspicious login activity via specific <a href="https://developers.cloudflare.com/bots/concepts/detection-ids/#account-takeover-detections"><u>detection IDs</u></a> that monitor login attempts and failures on the zone. These detection IDs are not currently affecting the bot score, but will begin to later in 2025. Already, they can help many customers detect more account takeover events now.</p><p>Detection ID <b>201326592 </b>monitors traffic on a customer website and looks for an anomalous rise in login failures (usually associated with brute force attacks), and ID <b>201326593 </b>looks for an anomalous rise in login attempts (usually associated with credential stuffing). </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4a2LGgB1bXGwFgHQEKFSI9/ff4194a6a470e466274f7b7c51e9dc18/6.png" />
          </figure>
    <div>
      <h3>Protect your applications</h3>
      <a href="#protect-your-applications">
        
      </a>
    </div>
    <p>If you are a Bot Management customer, log in and head over to the Cloudflare dashboard and take a look in Security Analytics for bot detection IDs <code>201326592</code> and <code>201326593</code>.</p><p>These will highlight ATO attempts targeting your site. If you spot anything suspicious, or would like to be protected against future attacks, create a rule that uses these detections to keep your application safe.</p> ]]></content:encoded>
            <category><![CDATA[Security Week]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[Bot Management]]></category>
            <category><![CDATA[Application Security]]></category>
            <category><![CDATA[Machine Learning]]></category>
            <category><![CDATA[Heuristics]]></category>
            <category><![CDATA[Edge Rules]]></category>
            <category><![CDATA[Security]]></category>
            <guid isPermaLink="false">4IkgXzyemEEsN7A6Cd18hb</guid>
            <dc:creator>Curtis Lowder</dc:creator>
            <dc:creator>Brian Mitchell</dc:creator>
            <dc:creator>Adam Martinetti</dc:creator>
        </item>
        <item>
            <title><![CDATA[Trapping misbehaving bots in an AI Labyrinth]]></title>
            <link>https://blog.cloudflare.com/ai-labyrinth/</link>
            <pubDate>Wed, 19 Mar 2025 13:00:00 GMT</pubDate>
            <description><![CDATA[ How Cloudflare uses generative AI to slow down, confuse, and waste the resources of AI Crawlers and other bots that don’t respect “no crawl” directives. ]]></description>
            <content:encoded><![CDATA[ <p>Today, we’re excited to announce AI Labyrinth, a new mitigation approach that uses AI-generated content to slow down, confuse, and waste the resources of AI Crawlers and other bots that don’t respect “no crawl” directives. When you opt in, Cloudflare will automatically deploy an AI-generated set of linked pages when we detect inappropriate bot activity, without the need for customers to create any custom rules.</p><p>AI Labyrinth is available on an opt-in basis to all customers, including the<a href="https://www.cloudflare.com/plans/free/"> Free plan</a>.</p>
    <div>
      <h3>Using Generative AI as a defensive weapon</h3>
      <a href="#using-generative-ai-as-a-defensive-weapon">
        
      </a>
    </div>
    <p>AI-generated content has exploded, reportedly accounting for <a href="https://www.thetimes.co.uk/article/why-ai-content-everywhere-how-to-detect-l2m2kdx9p"><u>four of the top 20 Facebook posts</u></a> last fall. Additionally, Medium estimates that <a href="https://www.wired.com/story/ai-generated-medium-posts-content-moderation/"><u>47% of all content</u></a> on their platform is AI-generated. Like any newer tool it has both wonderful and <a href="https://www.npr.org/2024/12/24/nx-s1-5235265/how-to-protect-yourself-from-holiday-ai-scams"><u>malicious</u></a> uses.</p><p>At the same time, we’ve also seen an explosion of new crawlers used by AI companies to scrape data for model training. AI Crawlers generate more than 50 billion requests to the Cloudflare network every day, or just under 1% of all web requests we see. While Cloudflare has several tools for <a href="https://www.cloudflare.com/learning/ai/how-to-block-ai-crawlers/"><u>identifying and blocking unauthorized AI crawling</u></a>, we have found that blocking malicious <a href="https://www.cloudflare.com/learning/bots/what-is-a-bot/">bots</a> can alert the attacker that you are on to them, leading to a shift in approach, and a never-ending arms race. So, we wanted to create a new way to thwart these unwanted bots, without letting them know they’ve been thwarted.</p><p>To do this, we decided to use a new offensive tool in the bot creator’s toolset that we haven’t really seen used defensively: AI-generated content. When we detect unauthorized crawling, rather than blocking the request, we will link to a series of AI-generated pages that are convincing enough to entice a crawler to traverse them. But while real looking, this content is not actually the content of the site we are protecting, so the crawler wastes time and resources. </p><p>As an added benefit, AI Labyrinth also acts as a next-generation honeypot. No real human would go four links deep into a maze of AI-generated nonsense. Any visitor that does is very likely to be a bot, so this gives us a brand-new tool to identify and fingerprint bad bots, which we add to our list of known bad actors. Here’s how we do it…</p>
    <div>
      <h3>How we built the labyrinth </h3>
      <a href="#how-we-built-the-labyrinth">
        
      </a>
    </div>
    <p>When AI crawlers follow these links, they waste valuable computational resources processing irrelevant content rather than extracting your legitimate website data. This significantly reduces their ability to gather enough useful information to train their models effectively.</p><p>To generate convincing human-like content, we used <a href="https://developers.cloudflare.com/workers-ai/"><u>Workers AI</u></a> with an open source model to create unique HTML pages on diverse topics. Rather than creating this content on-demand (which could impact performance), we implemented a pre-generation pipeline that sanitizes the content to<a href="https://www.cloudflare.com/learning/security/how-to-prevent-xss-attacks/"> prevent any XSS vulnerabilities</a>, and stores it in <a href="https://www.cloudflare.com/developer-platform/products/r2/">R2</a> for faster retrieval. We found that generating a diverse set of topics first, then creating content for each topic, produced more varied and convincing results. It is important to us that we don’t generate inaccurate content that contributes to the spread of misinformation on the Internet, so the content we generate is real and related to scientific facts, just not relevant or proprietary to the site being crawled.</p><p>This pre-generated content is seamlessly integrated as hidden links on existing pages via our custom HTML transformation process, without disrupting the original structure or content of the page. Each generated page includes appropriate meta directives to protect SEO by preventing search engine indexing. We also ensured that these links remain invisible to human visitors through carefully implemented attributes and styling. To further minimize the impact to regular visitors, we ensured that these links are presented only to suspected AI scrapers, while allowing legitimate users and verified crawlers to browse normally.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2PHSCXVMFipAhGJ5IheXW3/a46aad93f2e60f6d892d4c597a752a58/image4.png" />
          </figure><p><sup><i>A graph of daily requests over time, comparing different categories of AI Crawlers.</i></sup></p><p>What makes this approach particularly effective is its role in our continuously evolving bot detection system. When these links are followed, we know with high confidence that it's automated crawler activity, as human visitors and legitimate browsers would never see or click them. This provides us with a powerful identification mechanism, generating valuable data that feeds into our <a href="https://www.cloudflare.com/learning/ai/what-is-machine-learning/">machine learning models</a>. By analyzing which crawlers are following these hidden pathways, we can identify new bot patterns and signatures that might otherwise go undetected. This proactive approach helps us <a href="https://www.cloudflare.com/learning/ai/how-to-prevent-web-scraping/">stay ahead of AI scrapers</a>, continuously improving our detection capabilities without disrupting the normal browsing experience.</p><p>By building this solution on our developer platform, we've created a system that serves convincing decoy content instantly while maintaining consistent quality - all without impacting your site's performance or user experience.</p>
    <div>
      <h3>How to use AI Labyrinth to stop AI crawlers</h3>
      <a href="#how-to-use-ai-labyrinth-to-stop-ai-crawlers">
        
      </a>
    </div>
    <p>Enabling AI Labyrinth is simple and requires just a single toggle in your Cloudflare dashboard. Navigate to the bot management section within your zone, and toggle the new AI Labyrinth setting to on:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/q1ZQlnnMztSsK8PWD1h0S/ef02f081544dc751f754e9630f17261e/image1.png" />
          </figure>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/61qBVDv0WFh8YzrbVULtxq/13ec46d7651c59454f9fe3754e253b85/image3.png" />
          </figure><p>Once enabled, the AI Labyrinth begins working immediately with no additional configuration needed.</p>
    <div>
      <h3>AI honeypots, created by AI</h3>
      <a href="#ai-honeypots-created-by-ai">
        
      </a>
    </div>
    <p>The core benefit of AI Labyrinth is to confuse and distract bots. However, a secondary benefit is to serve as a next-generation honeypot. In this context, a honeypot is just an invisible link that a website visitor can’t see, but a bot parsing HTML would see and click on, therefore revealing itself to be a bot. Honeypots have been used to catch hackers as early as the late <a href="https://medium.com/@jcart657/the-cuckoos-egg-9b502442ea67"><u>1986 Cuckoo’s Egg incident</u></a>. And in 2004, <a href="https://www.projecthoneypot.org/"><u>Project Honeypot</u></a> was created by Cloudflare founders (prior to founding Cloudflare) to let everyone easily deploy free email honeypots, and receive lists of crawler IPs in exchange for contributing to the database. But as bots have evolved, they now proactively look for honeypot techniques like hidden links, making this approach less effective.</p><p>AI Labyrinth won’t simply add invisible links, but will eventually create whole networks of linked URLs that are much more realistic, and not trivial for automated programs to spot. The content on the pages is obviously content no human would spend time-consuming, but AI bots are programmed to crawl rather deeply to harvest as much data as possible. When bots hit these URLs, we can be confident they aren’t actual humans, and this information is recorded and automatically fed to our machine learning models to help improve our bot identification. This creates a beneficial feedback loop where each scraping attempt helps protect all Cloudflare customers.</p>
    <div>
      <h3>What’s next</h3>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>This is only the first iteration of using generative AI to thwart bots for us. Currently, while the content we generate is convincingly human, it won’t conform to the existing structure of every website. In the future, we’ll continue to work to make these links harder to spot and make them fit seamlessly into the existing structure of the website they’re embedded in. You can help us by opting in now.</p><p>To take the next step in the fight against bots, <a href="https://dash.cloudflare.com/?to=/:account/:zone/security/detections/bot-traffic"><u>opt-in to AI Labyrinth</u></a> today.</p> ]]></content:encoded>
            <category><![CDATA[Security Week]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[Bot Management]]></category>
            <category><![CDATA[AI Bots]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Machine Learning]]></category>
            <category><![CDATA[Generative AI]]></category>
            <guid isPermaLink="false">1Zh4fm4BB1S3xuVwfETiTE</guid>
            <dc:creator>Reid Tatoris</dc:creator>
            <dc:creator>Harsh Saxena</dc:creator>
            <dc:creator>Luis Miglietti</dc:creator>
        </item>
        <item>
            <title><![CDATA[Extending Cloudflare Radar’s security insights with new DDoS, leaked credentials, and bots datasets]]></title>
            <link>https://blog.cloudflare.com/cloudflare-radar-ddos-leaked-credentials-bots/</link>
            <pubDate>Tue, 18 Mar 2025 13:00:00 GMT</pubDate>
            <description><![CDATA[ For Security Week 2025, we are adding several new DDoS-focused graphs, new insights into leaked credential trends, and a new Bots page to Cloudflare Radar.  ]]></description>
            <content:encoded><![CDATA[ <p>Security and attacks continues to be a very active environment, and the visibility that Cloudflare Radar provides on this dynamic landscape has evolved and expanded over time. To that end, during 2023’s Security Week, we <a href="https://blog.cloudflare.com/radar-url-scanner-early-access/"><u>launched our URL Scanner</u></a>, which enables users to safely scan any URL to determine if it is safe to view or interact with. During 2024’s Security Week, we <a href="https://blog.cloudflare.com/email-security-insights-on-cloudflare-radar/"><u>launched an Email Security page</u></a>, which provides a unique perspective on the threats posed by malicious emails, spam volume, the adoption of <a href="https://www.cloudflare.com/learning/email-security/dmarc-dkim-spf/"><u>email authentication methods like SPF, DMARC, and DKIM</u></a>, and the use of IPv4/IPv6 and TLS by email servers. For Security Week 2025, we are adding several new DDoS-focused graphs, new insights into leaked credential trends, and a new <b>Bots</b> page to Cloudflare Radar.  We are also taking this opportunity to <a href="https://www.cloudflare.com/learning/cloud/how-to-refactor-applications/">refactor</a> Radar’s <b>Security &amp; Attacks</b> page, breaking it out into <b>Application Layer</b> and <b>Network Layer</b> sections.</p><p>Below, we review all of these changes and additions to Radar.</p>
    <div>
      <h3>Layered security</h3>
      <a href="#layered-security">
        
      </a>
    </div>
    <p>Since Cloudflare Radar launched in 2020, it has included both network layer (Layers 3 &amp; 4) and application layer (Layer 7) attack traffic insights on a single <b>Security &amp; Attacks</b> page. Over the last four-plus years, we have evolved some of the existing data sets on the page, as well as adding new ones. As the page has grown and improved over time, it risked becoming unwieldy to navigate, making it hard to find the graphs and data of interest. To help address that, the <b>Security</b> section on Radar now features separate <a href="https://radar.cloudflare.com/security/application-layer"><b><u>Application Layer</u></b></a> and <a href="https://radar.cloudflare.com/security/network-layer"><b><u>Network Layer</u></b></a> pages. The <b>Application Layer</b> page is the default, and includes insights from analysis of HTTP-based malicious and attack traffic. The <b>Network Layer</b> page includes insights from analysis of network and transport layer attacks, as well as observed <a href="https://blog.cloudflare.com/tcp-resets-timeouts/"><u>TCP resets and timeouts</u></a>. Future security and attack-related data sets will be added to the relevant page. <a href="https://radar.cloudflare.com/email-security"><b><u>Email Security</u></b></a> remains on its own dedicated page.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3zxFA8iG7N9MZQzAQAaCNf/ee92891b74b0d70052cc43239dad656f/image__2_.png" />
          </figure>
    <div>
      <h3>A geographic and network view of application layer DDoS attacks</h3>
      <a href="#a-geographic-and-network-view-of-application-layer-ddos-attacks">
        
      </a>
    </div>
    <p>Radar’s <a href="https://radar.cloudflare.com/reports"><u>quarterly DDoS threat reports</u></a> have historically provided insights, aggregated on a quarterly basis, into the top source and target locations of application layer DDoS attacks. A <a href="https://radar.cloudflare.com/security/application-layer#application-layer-ddos-attacks-distribution"><u>new map and table</u></a> on Radar’s <b>Application Layer</b> Security page now provide more timely insights, with a global choropleth map showing a geographical distribution of source and target locations, and an accompanying list of the top 20 locations by share of all DDoS requests. Source location attribution continues to rely on the geolocation of the IP address originating the blocked request, while target location remains the billing location of the account that owns the site being attacked. </p><p>Over the first week of March 2025, the United States, Indonesia, and Germany were the top sources of <a href="https://www.cloudflare.com/learning/ddos/application-layer-ddos-attack/">application layer DDoS attacks</a>, together accounting for over 30% of such attacks as shown below. The concentration across the top targeted locations was quite different, with customers from Canada, the United States, and Singapore attracting 56% of application layer DDoS attacks.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7nW8CyC5L4s68ntQEe1pfT/a8b571b9b2325f5f71936367e8879af1/image10.png" />
          </figure><p>In addition to extended visibility into the geographic source of application layer DDoS attacks, we have also added <a href="https://www.cloudflare.com/learning/network-layer/what-is-an-autonomous-system/"><u>autonomous system (AS)</u></a>-level visibility. A <a href="https://radar.cloudflare.com/security/application-layer#application-layer-ddos-attacks-source-as-distribution"><u>new treemap</u></a> view shows the distribution of these attacks by source AS. At a global level, the largest sources include cloud/hosting providers in Germany, the United States, China, and Vietnam.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/332A6OGGKUNCE0wLVBKBmd/2b93c6bd46105602214d7ced2ead71b6/image7.png" />
          </figure><p>For a selected country/region, the treemap displays a source AS distribution for attacks observed to be originating from that location. In some, the sources of attack traffic are heavily concentrated in consumer/business network providers, such as in <a href="https://radar.cloudflare.com/security/application-layer/pt#application-layer-ddos-attacks-source-as-distribution"><u>Portugal</u></a>, shown below. However, in other countries/regions that have a large cloud provider presence, such as <a href="https://radar.cloudflare.com/security/application-layer/ie#application-layer-ddos-attacks-source-as-distribution"><u>Ireland</u></a>, <a href="https://radar.cloudflare.com/security/application-layer/sg#application-layer-ddos-attacks-source-as-distribution"><u>Singapore</u></a>, and the <a href="https://radar.cloudflare.com/security/application-layer/us#application-layer-ddos-attacks-source-as-distribution"><u>United States</u></a>, ASNs associated with these types of providers are the dominant sources. To that end, Singapore was listed as being among the top sources of application layer DDoS attacks in each of the quarterly DDoS threat reports in 2024. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3RpwYjFjmTR2WFx5UpsUuS/5211c6e5afd4733185ba1ed03750d1d2/image3.png" />
          </figure>
    <div>
      <h3>Have you been pwned?</h3>
      <a href="#have-you-been-pwned">
        
      </a>
    </div>
    <p>Every week, it seems like there’s another headline about a <a href="https://www.cloudflare.com/learning/security/what-is-a-data-breach/">data breach</a>, talking about thousands or millions of usernames and passwords being stolen. Or maybe you get an email from an identity monitoring service that your username and password were found on the “dark web”. (Of course, you’re getting those alerts thanks to a complementary subscription to the service offered as penance from another data breach…)</p><p>This credential theft is especially problematic because people often reuse passwords, despite best practices advising the use of strong, unique passwords for each site or application. To help mitigate this risk, <a href="https://blog.cloudflare.com/a-safer-internet-with-cloudflare/#account-takeover-detection"><u>starting in 2024</u></a>, Cloudflare began enabling customers to scan authentication requests for their websites and applications using a <a href="https://blog.cloudflare.com/privacy-preserving-compromised-credential-checking/"><u>privacy-preserving</u></a> compromised credential checker implementation to detect known-leaked usernames and passwords. Today, we're using aggregated data to display trends in how often these leaked and stolen credentials are observed across Cloudflare's network. (Here, we are defining “leaked credentials” as usernames or passwords being found in a public dataset, or the username and password detected as being similar.)</p><p><a href="https://developers.cloudflare.com/waf/detections/leaked-credentials/#how-it-works"><u>Leaked credentials detection</u></a> scans incoming HTTP requests for known authentication patterns from common web apps and any custom detection locations that were configured. The service uses a privacy-preserving compromised credential checking protocol to compare a hash of the detected passwords to hashes of compromised passwords found in databases of leaked credentials. A <a href="https://radar.cloudflare.com/security/application-layer#leaked-credentials-usage"><u>new Radar graph</u></a> on the worldwide <b>Application Layer</b> Security page provides visibility into aggregate trends around the detection of leaked credentials in authentication requests. Filterable by authentication requests from human users, bots, or all (human + bot), the graph shows the distribution requests classified as “clean” (no leaked credentials detected) and “compromised” (leaked credentials, as defined above, were used). At a worldwide level, we found that for the first week of March 2025, leaked credentials were used in 64% of all, over 65% of bot, and over 44% of human authorization requests. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5r8sHbOTrQ1ceGpLa0d3Dn/4ea9ab5b342cd394bd096349d4907ab0/image6.png" />
          </figure><p>This suggests that from a human perspective, password reuse is still a problem, as is users not taking immediate actions to change passwords when notified of a breach. And from a bot perspective, this suggests that attackers know that there is a good chance that leaked credentials for one website or application will enable them to access that same user’s account elsewhere.</p><p>As a complement to the leaked credentials data, Radar is also now providing a worldwide view into the <a href="https://radar.cloudflare.com/security/application-layer#bot-vs-human"><u>share of authentication requests originating from bots</u></a>. Note that not all of these requests are necessarily malicious — while some may be associated with <a href="https://www.cloudflare.com/learning/bots/what-is-credential-stuffing/">credential stuffing-style attacks</a>, others may be from automated scripts or other benign applications accessing an authentication endpoint. (Having said that, automated malicious attack request volume far exceeds legitimate automated login attempts.) During the first week of March 2025, we found that over 94% of authentication requests came from bots (were automated), with the balance coming from humans. Over that same period, <a href="https://radar.cloudflare.com/traffic?dateStart=2025-03-01&amp;dateEnd=2025-03-07#bot-vs-human"><u>bot traffic only accounted for 30% of overall requests</u></a>. So although bots don’t represent a majority of request traffic, authentication requests appear to comprise a significant portion of their activity.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/20tYE9sH7SOnJDsnh28ZYz/3c33320a0a7406a348c4b92d22f469ed/image4.png" />
          </figure>
    <div>
      <h3>Bots get a dedicated page</h3>
      <a href="#bots-get-a-dedicated-page">
        
      </a>
    </div>
    <p>As a reminder, <a href="https://www.cloudflare.com/learning/bots/what-is-a-bot/"><u>bot</u></a> traffic describes any non-human Internet traffic, and monitoring bot levels can help spot potential malicious activities. Of course, bots can be helpful too, and Cloudflare maintains a list of <a href="https://radar.cloudflare.com/bots#verified-bots"><u>verified bots</u></a> to help keep the Internet healthy. Given the importance of monitoring bot activity, we have launched a new dedicated <a href="https://radar.cloudflare.com/bots"><b><u>Bots</u></b></a> page in the Traffic section of Cloudflare Radar to support these efforts. For both worldwide and location views over the selected time period, the page shows the distribution of bot (automated) vs. human HTTP requests, as well as a graph showing bot traffic trends. (Our <a href="https://developers.cloudflare.com/bots/concepts/bot-score/"><u>bot score</u></a>, combining machine learning, heuristics, and other techniques, is used to identify automated requests likely to be coming from bots.) </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5bvoSNcGRbc2I5RGhZoIdo/948eca1e82148a1db6130295c0ea2f42/image2.png" />
          </figure><p>Both the <a href="https://radar.cloudflare.com/year-in-review/2023/#bot-traffic-sources"><u>2023</u></a> and <a href="https://radar.cloudflare.com/year-in-review/2024#bot-traffic-sources"><u>2024</u></a> Cloudflare Radar Year in Review microsites included a “Bot Traffic Sources” section, showing the locations and networks that Cloudflare determined that the largest shares of <a href="https://developers.cloudflare.com/bots/concepts/bot-score/"><u>automated/likely automated</u></a> traffic was originating from. However, these traffic shares were published just once a year, aggregating traffic from January through the end of November.</p><p>In order to provide a more timely perspective, these insights are now available on the new Radar Bots page. Similar to the new <a href="https://www.cloudflare.com/learning/ddos/what-is-a-ddos-attack/">DDoS attacks</a> content discussed above, the <a href="https://radar.cloudflare.com/bots#bot-traffic-sources"><u>worldwide view</u></a> includes a choropleth map and table illustrating the locations originating the largest shares of all bot traffic. (Note that a similar <a href="https://radar.cloudflare.com/traffic#traffic-characteristics"><u>Traffic Characteristics</u></a> map and table on the <a href="https://radar.cloudflare.com/traffic"><u>Traffic Overview page</u></a> ranks locations by the bot traffic share of the location’s total traffic.) Similar to Year in Review data linked above, the United States continues to originate the largest share of bot traffic.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/RPCiFgeihzYIbm4XIdQ1r/ae9afa59586266c785e4e9dfa8cb3428/image11.png" />
          </figure><p>In addition, the worldwide view also breaks out <a href="https://radar.cloudflare.com/bots#bot-traffic-share-by-autonomous-system"><u>bot traffic share by AS</u></a>, mirroring the treemap shown in the Year in Review. As we have noted <a href="https://blog.cloudflare.com/radar-2024-year-in-review/#the-united-states-was-responsible-for-over-a-third-of-global-bot-traffic-amazon-web-services-was-responsible-for-12-7-of-global-bot-traffic-and-7-8-came-from-google"><u>previously</u></a>, cloud platform providers account for a significant amount of bot traffic.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5CsXZNKOZ2ssiVaUzVVM51/5d8ad8b6a257eb0413be19fa101806c9/image8.png" />
          </figure><p>At a location level, depending on the country/region selected, the top sources of bot traffic may be cloud/hosting providers, consumer/business network providers, or a mix. For instance, <a href="https://radar.cloudflare.com/bots/fr#bot-traffic-sources"><u>France’s distribution</u></a> is shown below, and four ASNs account for just over half of the country’s bot traffic. Of these ASNs, two (<a href="https://radar.cloudflare.com/as16276"><u>AS16276</u></a> and <a href="https://radar.cloudflare.com/as12876"><u>AS12876</u></a>) belong to cloud/hosting providers, and two (<a href="https://radar.cloudflare.com/as3215"><u>AS3215</u></a> and <a href="https://radar.cloudflare.com/as12322"><u>AS12322</u></a>) belong to network providers.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4RM9zvPRa8NzfdTOFMvMcB/b04d53dd725d1890918668775b551004/image9.png" />
          </figure><p>In addition, the <a href="https://radar.cloudflare.com/bots#verified-bots"><u>Verified Bots list</u></a> has been moved to the new Bots page on Radar. The data shown and functionality remains unchanged, and links to the old location will automatically be redirected to the new one.</p>
    <div>
      <h3>Summary</h3>
      <a href="#summary">
        
      </a>
    </div>
    <p>The Cloudflare dashboard provides customers with specific views of security trends, application and network layer attacks, and bot activity across their sites and applications. While these views are useful at an individual customer level, aggregated views at a worldwide, location, and network level provide a macro-level perspective on trends and activity. These aggregated views available on Cloudflare Radar not only help customers understand how their observations compare to the larger whole, but they also help the industry understand emerging threats that may require action.</p><p>The underlying data for the graphs and data discussed above is available via the Radar API (<a href="https://developers.cloudflare.com/api/resources/radar/subresources/attacks/subresources/layer7/"><u>Application Layer</u></a>, <a href="https://developers.cloudflare.com/api/resources/radar/subresources/attacks/subresources/layer3/"><u>Network Layer</u></a>, <a href="https://developers.cloudflare.com/api/resources/radar/subresources/http/"><u>Bots</u></a>, <a href="https://developers.cloudflare.com/api/resources/radar/subresources/leaked_credentials/"><u>Leaked Credentials</u></a>). The data can also be interactively explored in more detail across locations, networks, and time periods using Radar’s <a href="https://radar.cloudflare.com/explorer"><u>Data Explorer and AI Assistant</u></a>. And as always, Radar and Data Explorer charts and graphs are downloadable for sharing, and embeddable for use in your own blog posts, websites, or dashboards.</p><p>If you share our security, attacks, or bots graphs on social media, be sure to tag us: <a href="https://x.com/CloudflareRadar"><u>@CloudflareRadar</u></a> and <a href="https://x.com/1111Resolver"><u>@1111Resolver</u></a> (X), <a href="https://noc.social/@cloudflareradar"><u>noc.social/@cloudflareradar</u></a> (Mastodon), and <a href="https://bsky.app/profile/radar.cloudflare.com"><u>radar.cloudflare.com</u></a> (Bluesky). If you have questions or comments, you can reach out to us on social media, or <a><u>contact us via email</u></a>.</p> ]]></content:encoded>
            <category><![CDATA[Security Week]]></category>
            <category><![CDATA[Radar]]></category>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[Passwords]]></category>
            <category><![CDATA[Security]]></category>
            <guid isPermaLink="false">4VnSmFMYvyiJqbBjhjo0DH</guid>
            <dc:creator>David Belson</dc:creator>
        </item>
    </channel>
</rss>