
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Thu, 09 Apr 2026 21:04:24 GMT</lastBuildDate>
        <item>
            <title><![CDATA[Announcing Cloudflare Account Abuse Protection: prevent fraudulent attacks from bots and humans]]></title>
            <link>https://blog.cloudflare.com/account-abuse-protection/</link>
            <pubDate>Thu, 12 Mar 2026 05:00:00 GMT</pubDate>
            <description><![CDATA[ Blocking bots isn’t enough anymore. Cloudflare’s new fraud prevention capabilities — now available in Early Access — help stop account abuse before it starts. ]]></description>
            <content:encoded><![CDATA[ <p>Today, Cloudflare is introducing a new suite of fraud prevention capabilities designed to stop account abuse before it starts. We've spent years empowering Cloudflare customers to protect their applications from automated attacks, but the threat landscape has evolved. The industrialization of hybrid automated-and-human abuse presents a complex security challenge to website owners. Consider, for instance, a single account that’s accessed from New York, London, and San Francisco in the same five minutes. The core question in this case is not “Is this automated?” but rather “Is this authentic?” </p><p><b>Website owners need the tools to stop abuse on their website, no matter who it’s coming from</b>.</p><p>During our Birthday Week in 2024, we gifted <a href="https://developers.cloudflare.com/waf/detections/leaked-credentials/"><u>leaked credentials detection</u></a> to all customers, including everyone on a Free plan. Since then, we've added <a href="https://developers.cloudflare.com/bots/additional-configurations/detection-ids/#account-takeover-detections"><u>account takeover detection IDs</u></a> as part of our <a href="https://www.cloudflare.com/application-services/products/bot-management/"><u>bot management solution</u></a> to help identify bots attacking your login pages. </p><p>Now, we’re combining these powerful tools with new ones. <b>Disposable email check</b> and <b>email risk </b>help you enforce security preferences for users who sign up with throwaway email addresses, a common tactic for fake account creation and promotion abuse, or whose emails are deemed risky based on email patterns and infrastructure. We’re also thrilled to introduce <b>Hashed User IDs</b> — per-domain identifiers generated by cryptographically hashing usernames — that give customers better insight into suspicious account activity and greater ability to mitigate potentially fraudulent traffic, without compromising end user privacy.</p><p><b>The new capabilities we’re announcing today go beyond automation, identifying abusive behavior and risky identities among human users </b><b><i>and</i></b><b> bots. </b><a href="https://developers.cloudflare.com/bots/account-abuse-protection/"><u>Account Abuse Protection</u></a> is available in Early Access, and any Bot Management Enterprise customer can use these features at no additional cost for a limited period, until the general availability of Cloudflare Fraud Prevention later this year. If you want to learn more about this Early Access capability, <a href="https://www.cloudflare.com/lp/account-abuse-protection/"><u>sign up here</u></a>.</p>
    <div>
      <h3>Leaked credentials make logins all too vulnerable</h3>
      <a href="#leaked-credentials-make-logins-all-too-vulnerable">
        
      </a>
    </div>
    <p>The barrier to entry for fraudulent behavior is dangerously low, especially with the availability of massive datasets and access to automated tools that commit account fraud at scale. Website owners aren’t just dealing with individual hackers, but industrialized fraud. Last year, we highlighted how <a href="https://blog.cloudflare.com/password-reuse-rampant-half-user-logins-compromised/"><b><u>41% of logins across our network use leaked credentials</u></b></a>. This number has only grown following the exposure of a database holding <a href="https://cybernews.com/security/billions-credentials-exposed-infostealers-data-leak/"><u>16 billion records</u></a>, and multiple high-profile breaches have since come to light. </p><p>What’s more, users reuse passwords across multiple platforms, meaning a single leak from years ago can still unlock a high-value retail or even a bank account today. Our <a href="https://developers.cloudflare.com/waf/detections/leaked-credentials/#leaked-credentials-fields"><u>leaked credential check</u></a> is a free feature that checks whether a password has been leaked in a known data breach of another service or application on the Internet. This is a privacy-preserving credential checking service that helps protect our users from compromised credentials, meaning Cloudflare performs these checks without accessing or storing plaintext end user passwords. <a href="https://blog.cloudflare.com/helping-keep-customers-safe-with-leaked-password-notification/#how-does-cloudflare-check-for-leaked-credentials"><u>Passwords are hashed — i.e., converted into a random string of characters using a cryptographic algorithm — for the purpose of comparing them against a database of leaked credentials.</u></a> If you haven’t already turned on our <a href="https://developers.cloudflare.com/waf/detections/leaked-credentials/#leaked-credentials-fields"><u>leaked credential check</u></a>, enable it now to keep your accounts safe from easy hacks!</p><p>Access to a large database of leaked credentials is only useful if an attacker can cycle through them quickly across many sites to identify which accounts are still vulnerable due to password reuse. In our Black Friday analysis in 2024, we observed that more than <a href="https://blog.cloudflare.com/grinch-bot-2024/"><b><u>60% of traffic to login pages across our network was automated</u></b></a>. That’s a lot of bots trying to break in.</p><p>To help customers protect their login endpoints from constant bombardment, we added <a href="https://www.cloudflare.com/learning/access-management/account-takeover/"><u>account takeover</u></a> <a href="https://developers.cloudflare.com/bots/additional-configurations/detection-ids/account-takeover-detections/"><u>(ATO)-specific detections</u></a> to highlight suspicious traffic patterns. This is part of our recent focus on <a href="https://blog.cloudflare.com/per-customer-bot-defenses/"><u>per-customer detections</u></a>, in which we provide behavioral anomaly detection unique to each bot management customer. Today, bot management customers can see and mitigate attempted ATO attacks in their login requests directly on the Security analytics dashboard.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3f2nQ5wBVQ2WqiKGsjVWJe/3c1011ced84e46f65938f32c88035de9/image5.png" />
          </figure><p><sup><i>In the card on the left within the Security analytics dashboard, you can view and address attempted account takeover attacks.</i></sup></p><p>In the last week, our ATO detections combined caught an average of <b>6.9 billion suspicious login attempts</b> daily, across our network. These ATO detections, along with the many other detection mechanisms in our bot management solution, create a <i>layered defense</i> against ATO and other malicious automated attacks.</p>
    <div>
      <h3>From automation to intent and identity</h3>
      <a href="#from-automation-to-intent-and-identity">
        
      </a>
    </div>
    <p>To discern automation, or to discern intent and identity? That is the question. Our answer: yes and yes, as both are critical layers of a robust security posture. Attackers now operate at a scale previously reserved for enterprise services: they leverage massive credential leaks, use human-powered fraud farms to spoof devices and locations, and create synthetic identities to maintain thousands — even millions — of fake accounts for promotion and platform abuse. A human being with automated tools could be draining accounts, abusing promotions, committing payment fraud, or all of the above.</p><p>Beyond that, automation is accessible like never before, particularly as users become better acquainted with using <a href="https://www.cloudflare.com/learning/ai/what-is-agentic-ai/"><u>AI agents</u></a> and even long-standing, “traditional” browsers move toward having agentic capabilities by default. Whether it’s a lone actor using an AI agent or a coordinated fraud campaign, the threat isn’t as simple as a single script — it can involve human intent, with automated execution.</p><p>Consider the following scenarios we’ve heard from our customers:</p><ul><li><p>We have 1,000 new users this month, but more than half of them are fake identities who benefit from a free trial, then disappear.</p></li><li><p>The attacker logged in with the correct password, so how do I know that it isn’t the real user?</p></li><li><p>This entity is acting at human pace, and they are draining accounts.</p></li></ul><p>These problems can't be solved by <i>only</i> assessing automation; they require checking for authenticity and integrity. This is the gap that our dedicated fraud prevention capabilities address.</p>
    <div>
      <h3>Assessing suspicious emails</h3>
      <a href="#assessing-suspicious-emails">
        
      </a>
    </div>
    <p>Let’s start by assessing the earliest point of potential account abuse: account creation. Fake or bulk account creation is one of the biggest topics in conversations about website fraud, as it can open the door for attackers to access an application — or even an entire business model. </p><p>Cloudflare is giving customers the tools to assess suspicious account creation at the source in two ways:</p><ol><li><p><b>Disposable email check</b>: Detect when users sign up with disposable, or throwaway, email addresses commonly used for promotion abuse and fake account creation. These disposable email services allow attackers to spin up thousands of "unique" accounts without maintaining real infrastructure, particularly unauthenticated disposable emails that provide instant access without account creation or free unlimited email aliases. Customers can use this binary field as they build rules to enforce security preferences, choosing to block all disposable emails outright, or perhaps issuing a <a href="https://developers.cloudflare.com/cloudflare-challenges/challenge-types/"><u>challenge</u></a> to anyone attempting to create an account with a disposable email.
</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3PQC7PqKWrhl5c4OCXu5Ha/9340e3b49cc396ca5f5d01d34fd529d5/image2.png" />
          </figure></li><li><p><b>Email risk:</b> Cloudflare analyzes email patterns and infrastructure to provide risk tiers (low, medium, high) that customers can use in security rules. We know that not all email addresses are created equal; an address with the format <code>firstname.lastname@knowndomain.com</code> carries different risk characteristics than <code>xk7q9m2p@newdomain.xyz</code>. Email risk tiers allow customers to express their tolerance for risk and friction at the point of account creation. </p></li></ol><p>Both disposable email check and email risk are now available in security analytics and security rules, equipping website owners to protect their account creation flow. These detections address a fundamental problem: by the time an account is committing abuse, it's already too late. The website owner has already paid acquisition costs, the fraudulent user has consumed promotional credits, and remediation requires manual review. Mitigating suspicious emails means adding the appropriate friction at signup — the moment it matters most.</p>
    <div>
      <h3>Introducing Hashed User IDs</h3>
      <a href="#introducing-hashed-user-ids">
        
      </a>
    </div>
    <p>Understanding patterns of abuse requires <i>visibility</i>: not only into the network, but of account activity. Traditionally, security has meant looking through the lens of IPs and isolated HTTP requests to spot automated activity, but website owners aren’t just thinking in terms of network signals; they are also considering their users and known accounts. That’s why we’re expanding our mitigation toolbox to match the way applications are actually structured, focusing on user-based detection of fraudulent activity.</p><p>Attackers can effortlessly rotate IPs to hide their tracks. But forcing them to repeatedly generate new, credible accounts introduces massive friction, especially when combined with account creation protections. When we look past the network layer and map fraudulent actions to a given compromised or abusive account, we can spot targeted behavior tied to a single, persistent actor and put a stop to the abuse. In this way, we’re shifting the defense strategy to the account level, instead of playing whack-a-mole with rotating IP addresses and residential proxies. This means that <b>our customers can mitigate abusive behavior based on the way </b><b><i>their</i></b><b> applications separate identity</b>.</p><p>To arm website owners with this capability, Cloudflare is releasing a <a href="https://developers.cloudflare.com/bots/account-abuse-protection/#user-id"><b><u>Hashed User ID</u></b></a> that customers can use in <a href="https://developers.cloudflare.com/waf/analytics/security-analytics/"><u>Security analytics</u></a>, <a href="https://developers.cloudflare.com/waf/custom-rules/"><u>Security rules</u></a>, and <a href="https://developers.cloudflare.com/rules/transform/managed-transforms/reference/"><u>Managed Transforms</u></a>. User IDs are per-domain, cryptographically hashed versions of the values in the username field, and each user ID is an encrypted, unique, and stable identifier generated for a given username on a customer application. <b>Importantly, the actual username is not logged or stored by Cloudflare as part of this service.</b> As with leaked credentials check and ATO detections, which identify login traffic and then encrypt credentials for comparison, we are prioritizing end user privacy while empowering our customers to take action against fraudulent behavior.</p><p>With access to Hashed User IDs, website owners can:</p><ul><li><p>See top users: Which accounts have the most activity?</p></li><li><p>See when a unique user logs in from a country they usually don’t — or multiple countries in one day!</p></li><li><p>Mitigate traffic based on unique user, such as blocking a user with historically suspicious activity.</p></li><li><p>Combine fields to see when accounts are being targeted with leaked credentials.</p></li><li><p>See what network patterns or signals are associated with unique users.</p></li></ul>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3f7Jm4HnngjYEmKG8QSiyC/2ae3543f0cd0eb072a0c4c2bb12c4436/image4.png" />
          </figure><p><sup><i>The expanded view of a single Hashed User ID within the Security analytics dashboard, showing the activity details of that unique user, including their login location and their browser. </i></sup></p><p>This user-level visibility transforms how website owners can investigate and mitigate traffic. Instead of examining individual requests in isolation, our customers can see the full picture of how attackers are targeting and hiding among legitimate users.</p>
    <div>
      <h3>Take the next step in account protection today</h3>
      <a href="#take-the-next-step-in-account-protection-today">
        
      </a>
    </div>
    <p>If you want to learn more about this Early Access capability, <a href="https://www.cloudflare.com/lp/account-abuse-protection/"><u>sign up here</u></a>. All Bot Management Enterprise customers are eligible to add these new Account Abuse Protection features today, and we’d love to open the conversation with any and all <a href="http://www.cloudflare.com/lp/account-abuse-protection"><u>prospective Bot Management customers</u></a>.</p><p>While bot detections will continue to answer the question of automation and intent, fraud detections delve into the question of authenticity. Together, they give website owners comprehensive tools to fight against the full spectrum of account abuse. This suite is one step in our ongoing investment to protect the entire user journey — from account creation and login to secure checkouts and the integrity of every interaction.</p> ]]></content:encoded>
            <category><![CDATA[Fraud]]></category>
            <category><![CDATA[Security]]></category>
            <guid isPermaLink="false">3oZLDQYiufcZZYvGXwxpKd</guid>
            <dc:creator>Jin-Hee Lee</dc:creator>
        </item>
        <item>
            <title><![CDATA[Building unique, per-customer defenses against advanced bot threats in the AI era]]></title>
            <link>https://blog.cloudflare.com/per-customer-bot-defenses/</link>
            <pubDate>Tue, 23 Sep 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Today, we are announcing a new approach to catching bots: using models to provide behavioral anomaly detection unique to each bot management customer and stop sophisticated bot attacks.  ]]></description>
            <content:encoded><![CDATA[ <p>Today, we are announcing a new approach to catching bots: using models to provide <b>behavioral anomaly detection </b><b><i>unique to each bot management customer</i></b> and stop sophisticated bot attacks. </p><p>With this per-customer approach, we’re giving every bot management customer hyper-personalized security capabilities to stop even the sneakiest bots. We’re doing this by not only making a first-request judgement call, but also by tracking behavior of bots who play the long-game and continuously execute unwanted behavior on our customers’ websites. We want to share how this service works, and where we’re focused. Our new platform has the power to fuel hundreds of thousands of unique detection suites, and we’ve heard our first target loud and clear from site owners: <a href="https://www.cloudflare.com/the-net/building-cyber-resilience/regain-control-ai-crawlers/"><u>protect websites</u></a> from the explosion of sophisticated, AI-driven web scraping.</p>
    <div>
      <h2>The new arms race: the rise of AI-driven scraping</h2>
      <a href="#the-new-arms-race-the-rise-of-ai-driven-scraping">
        
      </a>
    </div>
    <p>The battle against malicious bots used to be a simpler affair. Attackers used scripts that were fairly easy to identify through static, predictable signals: a request with a missing User-Agent header, a malformed method name, or traffic from a non-standard port was a clear indicator of malicious intent. However, the Internet is always evolving. As websites became more dynamic to create rich user experiences, attackers evolved their tools in response. The simple scripts of yesterday were replaced by headless browsers and automation frameworks, capable of rendering pages and mimicking human interaction with far greater fidelity.</p><p>AI has made this even trickier. The rise of <a href="https://www.cloudflare.com/learning/ai/what-is-generative-ai/"><u>Generative AI</u></a> has fundamentally changed the capabilities and the motivations of attackers. The web scraping of today isn’t limited to competitive price intelligence or content aggregation, but driven by the voracious appetite of <a href="https://www.cloudflare.com/learning/ai/what-is-large-language-model/"><u>Large Language Models (LLMs)</u></a> for training data.</p><p>Cloudflare’s data shows this shift in stark terms. In mid-2025, <a href="https://radar.cloudflare.com/ai-insights?dateStart=2025-07-01&amp;dateEnd=2025-07-07#crawl-purpose"><b><u>crawling for the purpose of AI model training accounted for nearly 80% of all AI bot activity</u></b></a> on our network, a significant increase from the year prior. Modern scraping tools are now AI-powered themselves. They leverage LLMs for semantic understanding of page content, use computer vision to solve visual challenges, and employ reinforcement learning to navigate complex websites they’ve never seen before. The evolution of these bots exposes critical vulnerability in the traditional, one-size-fits-all approach to security. While global threat intelligence is immensely powerful for stopping widespread attacks, these new <b>AI-powered scrapers are designed to blend in</b>. They can rotate IP addresses through residential proxies, generate human-like user agents, and mimic plausible browsing patterns. A request from one of these bots might not look anomalous when compared to the trillions of requests we see across the Cloudflare network, but would appear anomalous when compared to the established patterns of legitimate users on a specific website. This means we need to build defenses against these bots from every angle we have — from the global view to specific behavior on a single application. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3muiMDClrUwUrh5yoDbqlv/9df48cc59dcefed98b16b7df7f72fbd6/image3.png" />
          </figure>
    <div>
      <h2>Globally scalable bot fingerprinting</h2>
      <a href="#globally-scalable-bot-fingerprinting">
        
      </a>
    </div>
    <p>To target specific well-known bots or bot actors, we leverage the Cloudflare network to fingerprint bots that we see behave similarly across millions of websites. Since June, Cloudflare’s bot detection security analysts have written <b>50 heuristics</b> to catch bots using a variety of signals, including but not limited to <b>HTTP/2 fingerprints</b> and <b>Client Hello extensions. </b>By observing traffic on millions of websites, we establish a baseline of legitimate fingerprints of common browsers and benign devices. When a new, unique fingerprint suddenly appears across many different sites, it's a tell-tale sign of a distributed botnet or a new automation tool, allowing our analysts to block the bot's signature itself and neutralize the entire campaign, regardless of the thousands of different IP addresses it might use.</p><p>Recently, we also introduced <a href="https://developers.cloudflare.com/bots/additional-configurations/detection-ids/#additional-detections"><b><u>detection improvements to tackle residential proxy networks</u></b></a> and similar commercial proxies, which are used by attackers to make their bots appear as thousands of distinct real visitors, allowing them to bypass traditional security measures. The superpower of this detection improvement? Combining the vast amount of network data we see with particular client-side fingerprints obtained through the millions of challenge solves that happen across the Internet daily. <a href="https://developers.cloudflare.com/cloudflare-challenges/"><u>Challenges</u></a> have always served as an ideal mitigation action for customers who want to protect their applications without compromising real-user experience, but now they also serve as a gift that keeps on giving: in this case, <b><i>feeding the Cloudflare threat detection teams a constant stream of client-side information</i></b> that allows us to pattern match to determine IP addresses that are used by residential proxy networks.</p><p>This detection improvement is already ingesting data from the entire Cloudflare network, automatically catching more malicious traffic for all customers using <a href="https://developers.cloudflare.com/bots/get-started/super-bot-fight-mode/"><u>Super Bot Fight Mode</u></a> (bot protection included for Pro, Business, and all Enterprise customers) and <a href="https://developers.cloudflare.com/bots/get-started/bot-management/"><u>Enterprise Bot Management</u></a>. Examining 7 days of data from the time of authoring this post, we’ve observed <b>11 billion requests</b> from millions of unique IP addresses that we’ve identified as connected to residential or commercial proxy networks. This is just one piece of the global detection puzzle; the existing <a href="https://blog.cloudflare.com/residential-proxy-bot-detection-using-machine-learning/"><u>residential proxy detection features in our ML</u></a><b> </b>already catch <i>tens of millions of requests every hour</i>. </p>
    <div>
      <h2>Hyper-personalized security: learning what's normal for <i>you</i></h2>
      <a href="#hyper-personalized-security-learning-whats-normal-for-you">
        
      </a>
    </div>
    <p>The new arms race against AI-powered bots necessitates a closer look — something more precise. For instance, a script that systematically scrapes every user profile on a social media site, or every product listing on an e-commerce platform, is exhibiting behavior that is fundamentally abnormal for <i>that application</i>, even if a standalone request appears benign. This realization is at the heart of our new strategy: to win this new arms race, defenses must become as bespoke and adaptive as the attacks they face.</p><p>To meet this challenge, we built a new, foundational platform engineered to deploy custom <a href="https://www.cloudflare.com/learning/ai/what-is-machine-learning/"><u>machine learning models</u></a> for every bot management customer. We’re creating a unique defense for every application. Because each website has different traffic, the traffic that we flag as anomalous will, of course, be different for each zone — for this system, we want to be clear that data from one customer’s zone won’t be used to train the model for another customer’s use.</p><p>Announcing this as a new platform capability, rather than a single feature, is a deliberate choice. It aligns with how we’ve approached our most significant innovations, from <a href="https://www.cloudflare.com/developer-platform/products/workers/"><u>Cloudflare Workers</u></a> changing how developers build applications, to <a href="https://www.cloudflare.com/developer-platform/products/ai-gateway/"><u>AI Gateway</u></a> creating a single control plane for AI observability and security. By focusing on the platform, we tackle the <a href="https://www.cloudflare.com/learning/ai/how-to-prevent-web-scraping/">scraping problems</a> our customers are seeing today <i>and</i> power future detections as bot attacks become increasingly sophisticated.</p><p>Our new generation of per-customer anomaly detection is a three-step process, designed to identify malicious behavior by first understanding what constitutes legitimate traffic for each individual website and API.</p>
    <div>
      <h3>Step 1: Establishing a dynamic baseline</h3>
      <a href="#step-1-establishing-a-dynamic-baseline">
        
      </a>
    </div>
    <p>For each customer zone, our behavioral detections ingest traffic data to build a baseline of normal activity. Rather than taking a static snapshot, our new platform ingests data to make living, continuously updated calculations of what “normal” looks like on a specific website. This approach understands seasonality, recognizes traffic spikes from legitimate marketing campaigns, and maps the typical pathways users take through a site. This approach evolves the concept of Anomaly Detection already present in our Enterprise Bot Management suite, but applies it at a far more granular and dynamic per-customer level.</p>
    <div>
      <h3>Step 2: Identifying the anomalies</h3>
      <a href="#step-2-identifying-the-anomalies">
        
      </a>
    </div>
    <p>Once the baseline of "normal" is established, we begin the true work — identifying deviations. Because the baseline is specific to each website, the anomalies detected are highly contextual, perhaps even invisible to a global system. We can examine a few different types of websites to unpack this:</p><ul><li><p><b>For a gaming company:</b> A normal traffic baseline might show millions of users making frequent, rapid API calls to a matchmaking service or an in-game inventory system. A behavioral detection model trained on this baseline would immediately flag a single user making slow, methodical, sequential API calls to scrape the entire player leaderboard. This behavior, while low in volume, is a clear anomaly against the backdrop of normal gameplay patterns.</p></li><li><p><b>For a retail website:</b> The normal baseline is a complex funnel of users browsing categories, viewing products, adding items to a cart, and proceeding to checkout. These detections would identify an actor that systematically visits every single product page in alphabetical order at a machine-like pace, without ever interacting with the cart or session cookies, as a significant anomaly indicative of <a href="https://www.cloudflare.com/learning/bots/what-is-content-scraping/"><u>content scraping</u></a>.</p></li><li><p><b>For a media publisher:</b> Normal user behavior involves reading a few articles, following internal links, and spending a measurable amount of time on each page. An anomaly would be a script that hits thousands of article URLs per minute, spending less than a second on each, purely to extract the text content for AI model training.</p></li></ul><p>In each case, the malicious activity is defined not by a universal signature, but <b><i>by its deviation from the application's unique, established norm</i></b>.</p>
    <div>
      <h3>Step 3: Generating actionable findings</h3>
      <a href="#step-3-generating-actionable-findings">
        
      </a>
    </div>
    <p>Detecting an anomaly is only half the battle. The power of bot management comes from its seamless integration into the Cloudflare security ecosystem you already use, turning detection into immediate, actionable findings. Customers can benefit from these behavioral detection improvements in two ways:</p><ol><li><p><b>New Bot Detection IDs: </b>For our Enterprise customers, we’re introducing a new set of <a href="https://developers.cloudflare.com/bots/additional-configurations/detection-ids/"><u>Bot Detection IDs</u></a>. Website owners and security teams can write WAF security rules to challenge, rate-limit, or block traffic based on the specific anomalies flagged by these detections. Since each detection type is tied to a unique ID, customers can see exactly what kind of behavior caused a request to be flagged as anomalous, offering a detailed, per-request view into stealthy malicious traffic. And for a wider view, customers can filter by Detection ID from their Security Analytics, to see the bigger picture of all traffic captured by that detection type.</p></li><li><p><b>Improving Bot Score:</b> Another key output from these new, per-customer models will be to directly influence the Bot Score of a request. A request flagged as anomalous will have its score lowered, moving it into the "Likely Automated" (scores 2-29) or "Automated" (score 1) categories. This means that existing WAF custom rules based on Bot Score will automatically see impact and become more effective against bespoke attacks, with no changes required. This functionality update is available today for our latest <a href="https://developers.cloudflare.com/bots/additional-configurations/detection-ids/#account-takeover-detections"><u>account takeover detection</u></a>, <a href="https://blog.cloudflare.com/residential-proxy-bot-detection-using-machine-learning/"><u>residential proxy detections</u></a> and our recent <a href="https://developers.cloudflare.com/bots/additional-configurations/detection-ids/#additional-detections"><u>enhancements</u></a>, and will be implemented in the future for our behavioral scraping detection. </p></li></ol><p>This three-step process is already in action with our behavioral detections to catch <a href="https://developers.cloudflare.com/bots/additional-configurations/detection-ids/#account-takeover-detections"><u>account takeover</u></a> attacks. Taking bot detection ID 201326598 as an example: it (1) establishes a zone-level baseline that understands what normal traffic patterns look like for a specific website, (2) examines anomalous login failures to identify brute force and credential stuffing attacks, then (3) allows customers to mitigate these attacks by automatically influencing bot score <i>and</i> offering more visibility with the detection ID’s analytics. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5w8HUyr51JD8K4EYT7teeL/ed825aa96c3ae1809199d32734f0e60d/image4.png" />
          </figure><p>This integration strategy creates a flywheel effect: the new intelligence from these improved detections immediately enhances the value of existing products like Super Bot Fight Mode, Bot Management, and the WAF, making the entire Cloudflare platform stronger for you.</p>
    <div>
      <h2>Taking on sophisticated scrapers</h2>
      <a href="#taking-on-sophisticated-scrapers">
        
      </a>
    </div>
    <p>The first challenge we’re tackling is sophisticated scraping. AI-driven scraping is one of the most pressing and rapidly evolving threats facing website owners today, and its adaptive nature makes it an ideal adversary for a system designed to fight an enemy that constantly changes its tactics.</p><p>The first generation of our improved behavioral detections are tuned specifically to detect scraping by analyzing signals that go beyond simple request headers. These include:</p><ul><li><p><b>Behavioral Analysis:</b> Looking at session traversal paths, the sequence of requests, and interaction (or lack thereof) with dynamic page elements.</p></li><li><p><b>Client Fingerprinting:</b> Analyzing subtle signals from the client to identify signs of automation such as JA4 fingerprints in the context of the customer's specific traffic baseline.</p></li><li><p><b>Content-Agnostic Detection:</b> These models do not need to understand the content of a page, only the patterns of how it is being accessed. This makes them highly scalable and efficient, without actually using the unique content on a website to make judgement calls.</p></li></ul><p>How do these scraping detections look, in practice? We validated our logic for detecting scraping with early adopters in a closed beta, in order to receive ground-truth feedback and tune our detections. As with any ideal detection, our goal is to capture as much malicious traffic as possible, without compromising the experience of legitimate website visitors. Looking at just a 24-hour period, our new scraping detections have caught hundreds of millions of requests, flagging <b>138 million scraping requests on just 5 of our early beta zones</b>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3dmVkAJR9ELqrGMFR4tbcI/732bbb2477c350ec97d8fcd70d57b782/image2.png" />
          </figure><p>Naturally, we see an overlap with our existing system of bot scoring, but the numbers here show us concretely that our new method of behavioral detections have a completely new value add: <b>34% of the requests flagged by our new scraping detections would not have been detected by our existing bot score system</b>, making us all the more eager to use these novel detections to inform the way we score automation.</p>
    <div>
      <h2>A birthday gift for the Internet</h2>
      <a href="#a-birthday-gift-for-the-internet">
        
      </a>
    </div>
    <p>Our mission to help build a better Internet means that when we develop powerful new defenses, we believe in democratizing access to them. Protecting the entire Internet from new and evolving threats requires raising the baseline of security for everyone.</p><p>In that spirit, we’re excited to announce that our enhanced behavioral detections will not only roll out to bot management customers, but will also benefit Cloudflare customers using our global Super Bot Fight Mode<b> </b>system. For our Enterprise Bot Management customers, we automatically tune our detections based on the exact traffic for each zone. Because these advanced models are trained on your zone’s specific traffic, they detect even the most evasive attacks: from account takeovers to web scraping to other attacks executed through residential proxy networks — and we consider this only the tip of the iceberg of behavioral bot profiling. </p>
    <div>
      <h2>The road ahead</h2>
      <a href="#the-road-ahead">
        
      </a>
    </div>
    <p>Our initial focus on scraping is just the beginning of a new wave of behavioral bot detections. The infrastructure we’ve built is a flexible, powerful foundation for tackling a wide range of malicious behavior on your websites; the same principles of establishing a per-customer baseline and detecting anomalies can be applied to other critical threats that are unique to an application's logic, such as credential stuffing, inventory hoarding, carding attacks, and API abuse.</p><p>We are moving into an era where generic defenses are no longer enough. As threats become more personal, so must the defenses against them, and paving this path of behavioral detections is our latest gift to the Internet. Our first offering of scraping behavioral detections is just around the corner: customers will be able to turn on this new detection from the <a href="https://dash.cloudflare.com/?to=/:account/:zone/security/overview"><u>Security Overview</u></a> page in their dashboard. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/9EW8B0vJ43k28c5USM5Ho/6a180ca73844c7432749ca36a12684aa/image5.png" />
          </figure><p>(We’re always looking for enthusiastic humans to help us in our mission against bots! If you’re interested in helping us build a better Internet, check out our <a href="https://www.cloudflare.com/careers/jobs/"><u>open positions.</u></a>)</p> ]]></content:encoded>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[Bot Management]]></category>
            <guid isPermaLink="false">1l4pM7l0pDUGAgKypKgs15</guid>
            <dc:creator>Jin-Hee Lee</dc:creator>
            <dc:creator>Oliver Payne</dc:creator>
            <dc:creator>Bob AminAzad</dc:creator>
            <dc:creator>Viktor Chynarov</dc:creator>
            <dc:creator>Aleksandar Pavlov Hrusanov</dc:creator>
            <dc:creator>Prajjwal Gupta</dc:creator>
        </item>
        <item>
            <title><![CDATA[The age of agents: cryptographically recognizing agent traffic]]></title>
            <link>https://blog.cloudflare.com/signed-agents/</link>
            <pubDate>Thu, 28 Aug 2025 14:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare now lets websites and bot creators use Web Bot Auth to segment agents from verified bots, making it easier for customers to allow or disallow the many types of user and partner directed. ]]></description>
            <content:encoded><![CDATA[ <p>On the surface, the goal of handling bot traffic is clear: keep malicious bots away, while letting through the helpful ones. Some bots are evidently malicious — such as mass price scrapers or those testing stolen credit cards. Others are helpful, like the bots that index your website. Cloudflare has segmented this second category of helpful bot traffic through our <a href="https://developers.cloudflare.com/bots/concepts/bot/#verified-bots"><u>verified bots</u></a> program, <a href="https://developers.cloudflare.com/bots/concepts/bot/verified-bots/policy/"><u>vetting</u></a> and validating bots that are transparent about who they are and what they do.</p><p>Today, the rise of <a href="https://agents.cloudflare.com/"><u>agents</u></a> has transformed how we interact with the Internet, often blurring the distinctions between benign and malicious bot actors. Bots are no longer directed only by the bot owners, but also by individual end users to act on their behalf. These bots directed by end users are often working in ways that website owners want to allow, such as planning a trip, ordering food, or making a purchase.</p><p>Our customers have asked us for easier, more granular ways to ensure specific <a href="https://www.cloudflare.com/learning/bots/what-is-a-bot/"><u>bots</u></a>, <a href="https://www.cloudflare.com/learning/bots/what-is-a-web-crawler/"><u>crawlers</u></a>, and <a href="https://www.cloudflare.com/learning/ai/what-is-agentic-ai/"><u>agents</u></a> can reach their websites, while continuing to block bad actors. That’s why we’re excited to introduce <b>signed agents</b>, an extension of our verified bots program that gives a new bot classification in our security rules and in Radar. Cloudflare has long recognized agents — but we’re now endowing them with their own classification to make it even easier for our customers to set the traffic lanes they want for their website. </p>
    <div>
      <h2>The age of agents</h2>
      <a href="#the-age-of-agents">
        
      </a>
    </div>
    <p>Cloudflare has continuously expanded our verified bot categorization to include different functions as the market has evolved. For instance, we first announced our grouping of <a href="https://blog.cloudflare.com/ai-bots/"><u>AI crawler traffic as an official bot category</u></a> in 2023. And in 2024, when OpenAI announced a <a href="https://openai.com/index/searchgpt-prototype/"><u>new AI search prototype</u></a> and introduced <a href="https://platform.openai.com/docs/bots"><u>three different bots</u></a> with distinct purposes, we <a href="https://blog.cloudflare.com/cloudflare-ai-audit-control-ai-content-crawlers/"><u>added three new categories</u></a> to account for this innovation: AI Search, AI Assistant, and Archiver.</p><p>But the bot landscape is constantly evolving. Let's unpack a common type of verified AI bot — an AI crawler such as <a href="https://radar.cloudflare.com/bots/directory/gptbot"><u>GPTBot</u></a>. Even though the bot performs an array of tasks, the bot’s ultimate purpose is a singular, repetitive task on behalf of the operator of that bot: fetch and index information. Its intelligence is applied to performing that singular job on behalf of that bot owner. </p><p>Agents, though, are different. Think about an AI agent tasked by a user to "Book the best deal for a round-trip flight to New York City next month." These agents sometimes use remote browsing products like Cloudflare's <a href="https://developers.cloudflare.com/browser-rendering/"><u>Browser Rendering</u></a> and similar products from companies like Browserbase and Anchor Browser. And here is the key distinction: this particular type of bot isn’t operating on behalf of a single company, like OpenAI in the prior example, but rather the end users themselves. </p>
    <div>
      <h2>Introducing signed agents</h2>
      <a href="#introducing-signed-agents">
        
      </a>
    </div>
    <p>In May, we announced Web Bot Auth, a new method of <a href="https://blog.cloudflare.com/web-bot-auth/"><u>using cryptography to verify bot and agent traffic</u></a>. HTTP message signatures allow bots to authenticate themselves and allow customer origins to identify them. This is one of the authentication methods we use today for our verified bots program. </p><p>What, exactly, is a <a href="https://developers.cloudflare.com/bots/concepts/bot/signed-agents/"><u>signed agent</u></a>? First, they are agents that are generally directed by an end user instead of a single company or entity. Second, the infrastructure or remote browsing platform the agents use is signing their HTTP requests via Web Both Auth, with Cloudflare validating these message signatures. And last, they comply with our <a href="https://developers.cloudflare.com/bots/concepts/bot/signed-agents/policy/"><u>signed agent policy</u></a>.</p><p>The signed agents classification improves on our existing frameworks in a couple of ways:</p><ol><li><p><b>Increased precision and visibility:</b> we’ve updated the <i>Cloudflare bots and agents directory to include signed agents</i> in addition to verified bots. This allows us to verify the cryptographic signatures of a much wider set of automated traffic, and our customers to granularly apply their security preferences more easily. Bot operators can now <i>submit signed agent applications from the Cloudflare dashboard</i>, allowing bot owners to specify to us how they think we should segment their automated traffic. </p></li><li><p><b>Easier controls from security rules</b>: similar to how they can take action on verified bots as a group, our Enterprise customers will be able to take action on <i>signed agents as a group when configuring their security rules</i>. This new field will be available in the Cloudflare dashboard under security rules soon.</p></li></ol><p>To apply to have an agent added to Cloudflare’s directory of bots and agents, customers should complete the <a href="https://dash.cloudflare.com?to=/:account/configurations/bot-submission-form"><u>Bot Submission Form</u></a> in the Cloudflare dashboard. Here, they can specify whether the submission should be considered for the signed agents list or the verified bots list. All signed agents will be recognized by their cryptographic signatures through <a href="https://datatracker.ietf.org/doc/html/draft-meunier-web-bot-auth-architecture"><u>Web Bot Auth validation</u></a>. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5caeGdhlmI3dO3GNZKeEUg/0dac239a94732404861b3876f6bdb8b6/BLOG-2930_2.png" />
          </figure><p><sub>The Bot Submission Form, available in the Cloudflare dashboard for bot owners to submit both verified bot and signed agent applications.</sub></p><p>We want to be clear: our verified bots program isn’t going anywhere. In fact, well-behaved and transparent applications that make use of signed agents can further qualify to be a verified bot, if their specific service adheres to our <a href="https://developers.cloudflare.com/bots/concepts/bot/verified-bots/policy/"><u>policy</u></a>. For instance,<a href="https://radar.cloudflare.com/scan"> <u>Cloudflare Radar's URL Scanner</u></a>, which relies on Browser Rendering as a service to scan URLs, is a <a href="https://radar.cloudflare.com/bots/directory/cloudflare-radar-url-scanner"><u>verified bot</u></a>. While Browser Rendering itself does not qualify to be a verified bot, URL Scanner does, since the bot owner (in this case, Cloudflare Radar) directs the traffic sent by the bot and always identifies itself with a unique Web Bot Auth signature — distinct from <a href="https://developers.cloudflare.com/browser-rendering/reference/automatic-request-headers/"><u>Browser Rendering’s signature</u></a>. </p>
    <div>
      <h2>From an agent’s perspective… </h2>
      <a href="#from-an-agents-perspective">
        
      </a>
    </div>
    <p>Since the launch of Web Bot Auth, our own Browser Rendering product has been sending signed Web Bot Auth HTTP headers, and is always given a bot score of 1 for our Bot Management customers. As of today, Browser Rendering will now show up in this new signed agent category. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1F8Z0E6WqJTxLf9G3PLB3a/84e80539be402066fe02ab60c431100a/BLOG-2930_3.png" />
          </figure><p>We’re also excited to announce the first cohort of agents that we’re partnering with and will be classifying as signed agents: <a href="https://openai.com/index/introducing-chatgpt-agent/"><u>ChatGPT agent</u></a>, <a href="https://block.xyz/inside/block-open-source-introduces-codename-goose"><u>Goose</u></a> from Block, <a href="https://docs.browserbase.com/introduction/what-is-browserbase"><u>Browserbase</u></a>, and <a href="https://anchorbrowser.io/"><u>Anchor Browser</u></a>. They are perfect examples of this new classification because their remote browsers are used by their end customers, not necessarily the companies themselves. We’re thrilled to partner with these teams to take this critical step for the AI ecosystem:</p><blockquote><p>“<i>When we built Goose as an open source tool, we designed it to run locally with an extensible architecture that lets developers automate complex workflows. As Goose has evolved to interact with external services and third-party sites on users' behalf, Web Bot Auth enables those sites to trust Goose while preserving what makes it unique. </i><b><i>This authentication breakthrough unlocks entirely new possibilities for autonomous agents</i></b>." – <b>Douwe Osinga</b>, Staff Software Engineer, Block</p></blockquote><blockquote><p><i>"At Browserbase, we provide web browsing capabilities for some of the largest AI applications. We're excited to partner with Cloudflare to support the adoption of Web Bot Auth, a critical layer of identity for agents. </i><b><i>For AI to thrive, agents need reliable, responsible web access.</i></b><i>"</i>  – <b>Paul Klein</b>, CEO, Browserbase</p></blockquote><blockquote><p><i>“Anchor Browser has partnered with Cloudflare to let developers ship verified browser agents. This way </i><b><i>trustworthy bots get reliable access while sites stay protected</i></b><i>.”</i> – <b>Idan Raman</b>, CEO, Anchor Browser</p></blockquote>
    <div>
      <h2>Updated visibility on Radar</h2>
      <a href="#updated-visibility-on-radar">
        
      </a>
    </div>
    <p>We want everyone to be in the know about our bot classifications. Cloudflare began publishing verified bots on our Radar page <a href="https://radar.cloudflare.com/bots#verified-bots"><u>back in 2022</u></a>, meaning anyone on the Internet — Cloudflare customer or not — can see all of our <a href="https://radar.cloudflare.com/bots#verified-bots"><u>verified bots on Radar</u></a>. We dynamically update the list of bots, but show more than just a list: we announced on <a href="https://www.cloudflare.com/en-gb/press-releases/2025/cloudflare-just-changed-how-ai-crawlers-scrape-the-internet-at-large/"><u>Content Independence Day</u></a> that <a href="https://blog.cloudflare.com/ai-search-crawl-refer-ratio-on-radar/#one-more-thing"><u>every verified bot would get its own page</u></a> in our public-facing directory on Radar, which includes the traffic patterns that we see for each bot.</p><p>Our directory has been updated to include <a href="https://radar.cloudflare.com/bots/directory"><b><u>both signed agents and verified bots</u></b></a> — we share exactly how Cloudflare classifies the bots that it recognizes, plus we surface all of the traffic that Cloudflare observes from these many recognized agents and bots. Through this updated directory, we’re not only giving better visibility to our customers, but also striving to set a higher standard for transparency of bot traffic on the Internet. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/65QPFjmbBde3EzHTOwElSL/cccc8f23c37716c251e0c21850855265/BLOG-2930_4.png" />
          </figure><p><sub>Cloudflare Radar’s Bots Directory, which lists verified bots and signed agents. This view is filtered to view only agent entries.</sub></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2wBz7UwrQQzT7rJJnXiF8C/16eed3f1afd95cac32c4bcb647c6e5e6/BLOG-2930_5.png" />
          </figure><p><sub>Cloudflare Radar’s signed agent page for ChatGPT agent, which includes its traffic patterns for the last 7 days, from August 21, 2025 to August 27, 2025. </sub></p>
    <div>
      <h2>What’s now, what’s next</h2>
      <a href="#whats-now-whats-next">
        
      </a>
    </div>
    <p>As of today, the Cloudflare bot directory supports both bots and agents in a more clear-cut way, and customers or agent creators can submit agents to be signed and recognized <a href="https://dash.cloudflare.com/?to=/:account/configurations/bot-submission-form"><u>through their account dashboard</u></a>. In addition, anyone can see our signed agents and their traffic patterns on Radar. Soon, customers will be able to take action on signed agents as a group within their firewall rules, the same way you can take action on our verified bots. </p><p>Agents are changing the way that humans interact with the Internet. Websites need to know what tools are interacting with them, and for the builders of those tools to be able to easily scale. Message signatures help achieve both of these goals, but this is only step one. Cloudflare will continue to make it easier for agents and websites to interact (or not!) at scale, in a seamless way. </p><p>
</p> ]]></content:encoded>
            <category><![CDATA[AI Week]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[AI Bots]]></category>
            <category><![CDATA[Bot Management]]></category>
            <category><![CDATA[Security]]></category>
            <guid isPermaLink="false">1LQFWI1jzZnWAqR4iFMLLi</guid>
            <dc:creator>Jin-Hee Lee</dc:creator>
        </item>
        <item>
            <title><![CDATA[Cloudflare protects against critical SharePoint vulnerability, CVE-2025-53770]]></title>
            <link>https://blog.cloudflare.com/cloudflare-protects-against-critical-sharepoint-vulnerability-cve-2025-53770/</link>
            <pubDate>Tue, 22 Jul 2025 16:30:00 GMT</pubDate>
            <description><![CDATA[ Microsoft disclosed two critical vulnerabilities, CVE-2025-53771 and CVE-2025-53770, that are exploited to attack SharePoint servers. ]]></description>
            <content:encoded><![CDATA[ <p>On July 19, 2025,<a href="https://msrc.microsoft.com/blog/2025/07/customer-guidance-for-sharepoint-vulnerability-cve-2025-53770/"> <u>Microsoft disclosed CVE-2025-53770</u></a>, a critical zero-day Remote Code Execution (RCE) vulnerability. Assigned a CVSS 3.1 base score of 9.8 (Critical), the vulnerability affects SharePoint Server 2016, 2019, and the Subscription Edition, along with unsupported 2010 and 2013 versions. Cloudflare’s WAF Managed Rules now includes 2 emergency releases that mitigate these vulnerabilities for WAF customers.</p>
    <div>
      <h3>Unpacking CVE-2025-53770</h3>
      <a href="#unpacking-cve-2025-53770">
        
      </a>
    </div>
    <p>The vulnerability's root cause is <a href="https://nvd.nist.gov/vuln/detail/CVE-2025-53770"><u>improper deserialization of untrusted data</u></a>, which allows a remote, unauthenticated attacker to execute arbitrary code over the network without any user interaction. Moreover, what makes CVE-2025-53770 uniquely threatening is its methodology – the exploit chain, labeled "ToolShell." ToolShell is engineered <i>to play the long-game</i>: attackers are not only gaining temporary access, but also taking the server's cryptographic machine keys, specifically the <code>ValidationKey</code> and <code>DecryptionKey</code>. Possessing these keys allows threat actors to independently forge authentication tokens and <code>__VIEWSTATE</code> payloads, granting them persistent access that can survive standard mitigation strategies such as a server reboot or removing web shells.</p><p>In response to the active nature of these attacks, the U.S. Cybersecurity and Infrastructure Security Agency (CISA) added CVE-2025-53770 to its<a href="https://www.cisa.gov/news-events/alerts/2025/07/20/cisa-adds-one-known-exploited-vulnerability-cve-2025-53770-toolshell-catalog"> <u>Known Exploited Vulnerabilities (KEV) catalog</u></a> with an emergency remediation deadline. The security community's consensus is clear: any organization with an on-premise SharePoint server on the Internet should assume it has been compromised and take immediate action to fully address this vulnerability.</p><p>Since releasing our vulnerability patch in Cloudflare’s WAF Managed Ruleset, we’ve tracked the number of HTTP request matches for the vulnerability, which you can see in the graph below. Notably, we observed a significant peak around 11AM UTC, the morning of July 22, at around 300,000 hits at one point in time. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1lIEI0Bq0Y9KKfejkUo2sB/3e0ae3f0ccfe0d4eec09ef837157323b/image2.png" />
          </figure>
    <div>
      <h3>How does the ToolShell exploit chain work?</h3>
      <a href="#how-does-the-toolshell-exploit-chain-work">
        
      </a>
    </div>
    <p>The ToolShell exploit chain was first demonstrated at the <a href="https://www.zerodayinitiative.com/blog/2025/5/16/pwn2own-berlin-2025-day-two-results"><u>Pwn2Own hacking competition</u></a> in May 2025, where researchers chained an authentication bypass (CVE-2025-49706) with a deserialization RCE (CVE-2025-49704). Unfortunately, this was not the end of ToolShell’s lifespan. Threat actors evidently analyzed the patches to find weaknesses and exploit them in the wild, forcing Microsoft to assign new identifiers and call out CVE-2025-53771 for the authentication bypass. This rapid exploit → patch → bypass cycle shows that threat actors are not merely discovering vulnerabilities, but also systematically reverse-engineering <i>patches</i> to weaponize bypasses. For responders, this closes the window – or hides it altogether – to respond and put up defenses, highlighting the need for evolving, proactive security postures.</p><p>The ToolShell exploit works in 3 stages:</p><ol><li><p><b>Authentication Bypass, leveraging CVE-2025-53771</b>: The attack begins with a <code>POST</code> request sent to the <code>/_layouts/15/ToolPane.aspx</code> endpoint, a legacy component of SharePoint. The crutch of this authentication bypass happens by setting the <code>Referer</code> header to <code>/_layouts/SignOut.aspx</code>, which tricks the SharePoint server into trusting the attacker. With trust in hand, the attacker is able to skip authentication checks and move forward with authenticated access.</p></li><li><p><b>Remote Code Execution via Deserialization, CVE-2025-53770: </b>With privileged access, the attacker can interact with the <code>ToolPane.aspx</code> endpoint. The attacker submits a malicious payload in the body of the <code>POST</code> request, triggering the core vulnerability: a deserialization flaw in which the SharePoint application deserializes the object into executable code on the server. At this point, the attacker can execute commands as they wish.</p></li><li><p><b>The Long-Game: Possessing Cryptographic Keys:</b> Finally, to play the long-game and maintain continued access, the attacker will use a specific web shell to steal the server's cryptographic machine keys. By taking the <code>ValidationKey</code> and the <code>DecryptionKey</code>, the attacker obtains the state information used by SharePoint. Possessing these keys allows the attacker to operate independently, long after the original exploit; this means they can continue to execute new malicious payloads on the exploited server. This permanent backdoor makes this attack method uniquely dangerous.</p></li></ol>
    <div>
      <h3>Cloudflare’s new WAF Managed Rules for CVE-2025-53770, CVE-2025-53771 </h3>
      <a href="#cloudflares-new-waf-managed-rules-for-cve-2025-53770-cve-2025-53771">
        
      </a>
    </div>
    <p>CVE-2025-53770 is a clear example of how modern cyber threats are two-sided, combining an initial breach vector with a mechanism for long-term persistence. This means that a successful defense will address both the immediate RCE vulnerability and the subsequent threat of unwelcome access. </p><p>Once a public proof-of-concept became available for this exploit, Cloudflare’s security analysts crafted and tested new patches, ensuring that they would address not only the initial attack, but also the longer-term threat. </p><p>The team began researching the exploit the evening of July 20, and on July 21, 2025, Cloudflare deployed our emergency WAF Managed Rules to patch the vulnerability, meaning every customer using the Cloudflare Managed Ruleset will automatically be protected from this critical SharePoint vulnerability. These rules have been announced on the <a href="https://developers.cloudflare.com/waf/change-log/2025-07-21-emergency/">WAF changelog</a> and will take effect immediately.</p> ]]></content:encoded>
            <category><![CDATA[WAF]]></category>
            <category><![CDATA[CVE]]></category>
            <guid isPermaLink="false">2RtKFdquX8O4ijNDZvLjyd</guid>
            <dc:creator>Jin-Hee Lee</dc:creator>
            <dc:creator>Vaibhav Singhal</dc:creator>
        </item>
        <item>
            <title><![CDATA[Control content use for AI training with Cloudflare’s managed robots.txt and blocking for monetized content]]></title>
            <link>https://blog.cloudflare.com/control-content-use-for-ai-training/</link>
            <pubDate>Tue, 01 Jul 2025 10:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare is making it easier for publishers and content creators of all sizes to prevent their content from being scraped for AI training by managing robots.txt on their behalf.  ]]></description>
            <content:encoded><![CDATA[ <p>Cloudflare is giving all website owners two new tools to easily control whether AI bots are allowed to access their content for model training. First, customers can let Cloudflare <b>create and manage a robots.txt file</b>, creating the appropriate entries to let crawlers know not to access their site for AI training. Second, all customers can choose a new option to <a href="https://www.cloudflare.com/learning/ai/how-to-block-ai-crawlers/">block AI bots</a> <b>only on portions of their site that are monetized through ads</b>.</p>
    <div>
      <h2>The new generation of AI crawlers</h2>
      <a href="#the-new-generation-of-ai-crawlers">
        
      </a>
    </div>
    <p>Creators that monetize their content by showing ads depend on traffic volume. Their livelihood is directly linked to the number of views their content receives. These creators have allowed crawlers on their sites for decades, for a simple reason: search crawlers such as <code>Googlebot</code> made their sites more discoverable, and drove more traffic to their content. Google benefitted from delivering better search results to their customers, and the site owners also benefitted through increased views, and therefore increased revenues.</p><p>But recently, a new generation of crawlers has appeared: bots that crawl sites to gather data for training AI models. While these crawlers operate in the same technical way as search crawlers, the relationship is no longer symbiotic. AI training crawlers use the data they ingest from content sites to answer questions for their own customers directly, within their own apps. They typically send much less traffic back to the site they crawled. Our <a href="https://radar.cloudflare.com/"><u>Radar</u></a> team did an analysis of crawls and referrals for sites behind Cloudflare. As HTML pages are arguably the most valuable content for these crawlers, we <a href="https://blog.cloudflare.com/ai-search-crawl-refer-ratio-on-radar/"><u>calculated crawl ratios</u></a> by dividing the total number of requests from relevant user agents associated with a given search or AI platform where the response was of <code>Content-type: text/html</code> by the total number of requests for HTML content where the <code>Referer</code>: header contained a hostname associated with a given search or AI platform. As of June 2025, we find that Google crawls websites about 14 times for every referral. But for AI companies, the <a href="https://radar.cloudflare.com/ai-insights#crawl-to-refer-ratio"><u>crawl-to-refer ratio</u></a> is orders of magnitude greater. In June 2025, <b>OpenAI’s crawl-to-referral ratio was 1,700:1, Anthropic’s 73,000:1</b>. This clearly breaks the “crawl in exchange for traffic” relationship that previously existed between search crawlers and publishers. (Please note that this calculation reflects our best estimate, recognizing that traffic referred by native apps may not always be attributed to a provider due to a lack of a <code>Referer</code>: header, which may affect the ratio.)</p><p>And while sites can use robots.txt to tell these bots not to crawl their site, most don’t take this first step. We found that only about <a href="https://radar.cloudflare.com/ai-insights#ai-user-agents-found-in-robotstxt"><b><u>37% of the top 10,000 domains currently have a robots.txt file</u></b></a>, showing that robots.txt is underutilized in this age of evolving crawlers.</p><p>That’s where Cloudflare comes in. Our mission is to help build a better Internet, and a better Internet is one with a huge thriving ecosystem of independent publishers. So, we’re taking action to keep that ecosystem alive.</p>
    <div>
      <h2>Giving ALL customers full control</h2>
      <a href="#giving-all-customers-full-control">
        
      </a>
    </div>
    <p>Protecting content creators isn’t new for Cloudflare. In July 2024, we gave everyone on the Cloudflare network a simple way to <a href="https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/"><u>block all AI scrapers with a single click</u></a> for free. We’ve already seen <b>more than 1 million customers enable this feature</b>, which has given us some interesting data.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2B8KAmaP6DrMEMW5YSjLYP/d9eb0f67a998b730373a27aa707ade9d/image5.png" />
          </figure><p>Since our last update, we can see that <code><b>Bytespider</b></code><b>, our previous top bot, has seen traffic volume decline 71.45% since the first week of July 2024</b>. During the same time, we saw an increased number of <code>Bytespider</code> requests that customers chose to specifically block. In contrast, <code>GPTBot</code> traffic volume has grown significantly as it has become more popular, now even surpassing traffic we see from big traditional tech players like Amazon and ByteDance.</p><p>The share of sites accessed by particular crawlers has gone down across the board since our last update. Previously, <code>Bytespider</code> accessed &gt;40% of websites protected by Cloudflare, but that number has dropped to only 9.37%. <code><b>GPTBot</b></code><b> has taken the top spot for most sites accessed</b>, but while its request volume has grown significantly (noted above), the share of sites it crawls has actually decreased since last year from 35.46% to 28.97%, with an increase in customers blocking.</p><table><tr><td><p>AI Bot</p></td><td><p>Share of Websites Accessed</p></td></tr><tr><td><p>GPTBot</p></td><td><p>28.97%</p></td></tr><tr><td><p>Meta-ExternalAgent</p></td><td><p>22.16%</p></td></tr><tr><td><p>ClaudeBot</p></td><td><p>18.80%</p></td></tr><tr><td><p>Amazonbot</p></td><td><p>14.56%</p></td></tr><tr><td><p>Bytespider</p></td><td><p>9.37%</p></td></tr><tr><td><p>GoogleOther</p></td><td><p>9.31%</p></td></tr><tr><td><p>ImageSiftBot</p></td><td><p>4.45%</p></td></tr><tr><td><p>Applebot</p></td><td><p>3.77%</p></td></tr><tr><td><p>OAI-SearchBot</p></td><td><p>1.66%</p></td></tr><tr><td><p>ChatGPT-User</p></td><td><p>1.06%</p></td></tr></table><p>And while AI Search and AI Assistant crawling related activity has exploded in popularity in the last 6 months, we still see their total traffic pale in comparison to AI training crawl activity, which has seen a <b>65% increase in traffic over the past 6 months</b>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7nOWMQs8IzgS3RfrXHaVT1/b1b31024a92b70a3f39083b376bb3934/image4.png" />
          </figure><p>To this end, we launched <a href="https://blog.cloudflare.com/cloudflare-ai-audit-control-ai-content-crawlers/"><u>free granular auditing</u></a> in September 2024 to help customers understand which crawlers were accessing their content most often, and created simple templates to block all or specific crawlers. And in December 2024, we made it easy for publishers to automatically block <a href="https://blog.cloudflare.com/ai-audit-enforcing-robots-txt/"><u>crawlers that weren’t respecting robots.txt</u></a>. But we realized many sites didn’t have the time to create or manage their own robots.txt file. Today, we’re going two steps further.</p>
    <div>
      <h2>Step 1: fully managed robots.txt</h2>
      <a href="#step-1-fully-managed-robots-txt">
        
      </a>
    </div>
    <p>When it comes to managing your website’s visibility to search engine crawlers and other bots, the <code>robots.txt</code> file is a key player. This simple text file acts like a traffic controller, signaling to bots which parts of the website they should or should not access. We can think of <a href="https://www.cloudflare.com/learning/bots/what-is-robots-txt/"><u>robots.txt</u></a> as a "Code of Conduct" sign posted at a community pool, listing general dos and don'ts, according to the pool owner’s wishes. While the sign itself does not enforce the listed directives, well-behaved visitors will still read the sign and follow the instructions they see. On the other hand, poorly-behaved visitors who break the rules risk <a href="https://blog.cloudflare.com/ai-audit-enforcing-robots-txt/"><u>getting themselves banned</u></a>. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6oGxSRxy3sU88o4TZP7p42/aea1d7bbf5e57eb133ce8cdfae88dc37/image2.png" />
          </figure><p>What do these files actually look like? Take Google’s as an example, visible to anyone at <a href="https://www.google.com/robots.txt"><u>https://www.google.com/robots.txt</u></a>. Parsing its contents, you'll notice four directives in the set of instructions: <b>User-agent</b>, <b>Disallow</b>, <b>Allow</b>, and <b>Sitemap</b>. In a <code>robots.txt</code> file, the <b>User-agent</b> directive specifies which bots the rules apply to. The <b>Disallow</b> directive tells those bots which parts of the website they should avoid. In contrast, the <b>Allow</b> directive grants specific bots permission to access certain areas. Finally, the<a href="https://www.sitemaps.org/index.html"> <b>Sitemap</b> directive</a> shows a bot which pages it can reach, so that it won’t miss any important pages. The <a href="https://www.ietf.org/"><u>Internet Engineering Task Force (IETF)</u></a> formalized the definition and language for the Robots Exclusion Protocol in <a href="https://datatracker.ietf.org/doc/html/rfc9309"><u>RFC 9309</u></a>, specifying the exact syntax and precedence of these directives. It also outlines how crawlers should handle errors or redirects while stressing that compliance is <i>voluntary</i> and does not constitute access control. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/79JML5EIN1f4NVzRankehO/20a2c99ccaca62e7718c9d66bb8585d5/image10.png" />
          </figure><p>Website owners should have agency over AI bot activity on their websites. We mentioned that only 37% of the top 10,000 domains on Cloudflare even have a robots.txt file. Of those robots files that do exist, few include Disallow directives for the <a href="https://radar.cloudflare.com/ai-insights#ai-bot-crawler-traffic"><i><u>top</u></i><u> AI Bots</u></a> that we see on a daily basis.  For instance, as of publication, <a href="https://radar.cloudflare.com/explorer?dataSet=robots_txt&amp;groupBy=user_agents%2Fdirective&amp;filters=directive%253DDISALLOW"><code><u>GPTBot</u></code><u> is only disallowed in 7.8% of the robots.txt files</u></a> found for the top domains; <code>Google-Extended</code> only shows up in 5.6%; <code>anthropic-ai</code>, <code>PerplexityBot</code>, <code>ClaudeBot</code>, and <code>Bytespider</code> each show up in under 5%. Furthermore, the difference between the 7.8% of Disallow directives for <code>GPTBot</code> and the ~5% of Disallow directives for other major AI crawlers suggests a gap between the desire to <a href="https://www.cloudflare.com/learning/ai/how-to-prevent-web-scraping/">prevent your content from being used for AI model training</a> and the proper configuration that accomplishes this by calling out bots like <code>Google-Extended</code>. (After all, there’s more to stopping AI crawlers than disallowing <code>GPTBot</code>.)</p><p>Along with viewing the most active bots and crawlers, Cloudflare Radar also shares weekly updates on how websites are handling <a href="https://radar.cloudflare.com/ai-insights?cf_target_id=3D982CE3E88C4E32F9D4AA79E7869F7C#ai-user-agents-found-in-robotstxt"><u>AI bots in their robots.txt files</u></a>. We can examine two snapshots below, one from <a href="https://radar.cloudflare.com/ai-insights?dateStart=2025-06-23&amp;dateEnd=2025-06-24"><u>June 2025</u></a> and the other from <a href="https://radar.cloudflare.com/ai-insights?dateStart=2025-01-26&amp;dateEnd=2025-02-01"><u>January 2025</u></a>:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/30Wc2jLvDqSMBKF5QxU2yc/f18b44d8ba9d11687c0224b40cf12675/image6.png" />
          </figure><p><sub><i>Radar snapshot from the week of June 23, 2025, showing the top AI user agents mentioned in the Disallow directive in robots.txt files across the top 10,000 domains. The 3 bots with the highest number of Disallows are GPTBot, CCBot, and facebookexternalhit.</i></sub></p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/T9krKSMLRud7sYgG7ahei/8632afeba6d22baa304ae9fd901e187a/image9.png" />
          </figure><p><sub><i>Radar snapshot from the week of January 26, 2025, showing the top AI user agents mentioned in the Disallow directive in robots.txt files across the top 10,000 domains. The 3 bots with the highest number of Disallows are GPTBot, CCBot, and anthropic-ai.</i></sub></p><p>From the above data, we also observe that fewer than 100 new robots.txt files have been added among the top domains between January and June. One visually striking change is the ratio of dark blue to light blue: compared to January, there is a steep decrease in “Partially Disallowed” permissions; websites are now flat-out choosing “Fully Disallowed” for the top AI crawlers, including <code>GPTBot</code>, <code>CCBot</code>, and <code>Google-Extended</code>. This underscores the changing landscape of web crawling, particularly the relationship of trust between website owners and AI crawlers.</p>
    <div>
      <h3>Putting up a guardrail with Cloudflare’s managed robots.txt</h3>
      <a href="#putting-up-a-guardrail-with-cloudflares-managed-robots-txt">
        
      </a>
    </div>
    <p>Many website owners have told us they’re in a tricky spot in this new era of AI crawlers. They’ve poured time and effort into creating original content, have published it on their own sites, and naturally want it to reach as many people as possible. To do that, website owners make their sites accessible to search engine crawlers, which index the content and make it discoverable in search results. But with the rise of AI-powered crawlers, that same content is now being scraped not just for indexing, but also to train AI models, often without the creator’s explicit consent. Take <code>Googlebot</code>, for example: it’s an absolute requirement for most website owners to allow for SEO. But Google crawls with user agent <code>Googlebot</code> for both SEO <i>and</i> AI training purposes. Specifically disallowing <a href="https://developers.google.com/search/docs/crawling-indexing/google-common-crawlers#google-extended"><code><u>Google-Extended</u></code></a> (but not <code>Googlebot</code>) in your robots.txt file is what communicates to Google that you do not want your content to be crawled to feed AI training.</p><p>So, what if you don’t want your content to serve as training data for the next AI model, but don’t have the time to manually maintain an up-to-date robots.txt file? <b>Enter Cloudflare’s new managed robots.txt offering.</b> Once enabled, Cloudflare will automatically update your existing robots.txt or create a robots.txt file on your site that includes directives asking popular AI bot operators to not use your content for AI model training. For instance, <b>Cloudflare’s managed robots.txt signals your preference to </b><code><b>Google-Extended</b></code><b> and </b><a href="https://support.apple.com/en-us/119829"><code><b><u>Applebot-Extended</u></b></code></a><b>, amongst others, that they should not crawl your site for AI training,</b> while keeping your domain(s) SEO-friendly.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2SLxL9LMN1IK2WXOIq8ezP/786db3e1cbc24b1cce4c337b8136d3a7/image3.png" />
          </figure><p><sup><i>Cloudflare dashboard snapshot of the new managed robots.txt activation toggle </i></sup></p><p>This feature is available to all customers, meaning anyone can <a href="https://developers.cloudflare.com/bots/additional-configurations/managed-robots-txt/"><u>enable this today</u></a> from the Cloudflare dashboard. Once enabled, website owners who previously had no robots.txt file will now have Cloudflare’s managed bot directives live on their website. What about website owners who already have a robots.txt file? The contents of Cloudflare’s managed robots.txt will be <i>prepended</i> to site owners’ existing file. This way, their existing Block directives – and the time and rationale put into customizing this file – are honored, while still ensuring the website has AI crawler guardrails managed by Cloudflare.</p><p>As the AI bot landscape changes with new bots on the rise, Cloudflare will keep our customers a step ahead by updating the directives on our managed robots.txt, so they don’t have to worry about maintaining things on their own. Once enabled, customers won’t need to take any action in order for any updates of the managed robots.txt content to go live on their site. </p><p>We believe that managing crawling is key to protecting the open Internet, so we’ll also be encouraging every new site that onboards to Cloudflare to enable our managed robots.txt. When you onboard a new site, you’ll see the following options for managing AI crawlers:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6l4RpmHHf0OGP44XyDnZra/66c30bb8080d3107ab93af55dc6a8c6e/Screenshot_2025-06-30_at_3.59.54%C3%A2__PM.png" />
          </figure><p>This makes it effortless to ensure that <b>every new customer or domain onboarded to Cloudflare gives clear directives to how they want their content used.</b></p>
    <div>
      <h3>Under the hood: technical implementation</h3>
      <a href="#under-the-hood-technical-implementation">
        
      </a>
    </div>
    <p>To implement this feature, we developed a new module that intercepts all inbound HTTP requests for <code>/robots.txt</code>. For all such requests, we’ll check whether the zone has opted in to use Cloudflare’s managed robots.txt by reading a value from our <a href="https://blog.cloudflare.com/introducing-quicksilver-configuration-distribution-at-internet-scale/"><u>distributed key-value store</u></a>. If they have, the module then responds with the Cloudflare’s managed robots.txt directives, prepended to the origin’s robot.txt if there is an existing file. We prepend so we can add a generalized header that instructs all bots on the customers preferences for data use, as defined in the <a href="https://www.ietf.org/archive/id/draft-it-aipref-attachment-00.html#name-introduction"><u>IETF AI preferences proposal</u></a>. Note that in robots.txt, the <a href="https://datatracker.ietf.org/doc/html/rfc9309#section-2.2.2"><u>most specific match</u></a> <i>must</i> always be used, and since our disallow expressions are scoped to cover everything, we can ensure a directive we prepend will never conflict with a more targeted customer directive. If the customer has <i>not</i> enabled this feature, the request is forwarded to the origin server as usual, using whatever the customer has written in their own robots.txt file. (While caching origin's robots.txt could reduce latency by eliminating a round trip to the origin, the impact on overall page load times would be minimal, as robots.txt requests comprise a small fraction of total traffic. Adding cache update/invalidation would introduce complexity with limited benefit, so we prioritized functionality and reliability in our implementation.)</p>
    <div>
      <h2>Step 2: block, but only where you show ads</h2>
      <a href="#step-2-block-but-only-where-you-show-ads">
        
      </a>
    </div>
    <p>Adding an entry to your robots.txt file is the first step to telling AI bots not to crawl you. But robots.txt is an honor system. Nothing forces bots to follow it. That’s why we introduced our <a href="https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/"><u>one-click managed rule</u></a> to block all AI bots across your zone. However, some customers want AI bots to visit certain pages, like developer or support documentation. For customers who are hesitant to block everywhere, we have a brand-new option: let us detect when ads are shown on a hostname, and we will block AI bots ONLY on that hostname. Here’s how we do it.</p><p>First, we use multiple techniques to identify if a request is coming from an AI bot. The easiest technique is to identify well-behaved crawlers that publicly declare their user agent, and use dedicated IP ranges. Often we work directly with these bot makers to add them to our <a href="https://radar.cloudflare.com/traffic/verified-bots"><u>Verified Bot list</u></a>.</p><p>Many bot operators act in good faith by publicly publishing their user agents, or even <a href="https://blog.cloudflare.com/verified-bots-with-cryptography/"><u>cryptographically verifying their bot requests</u></a> directly with Cloudflare. Unfortunately, some attempt to appear like a real browser by using a spoofed user agent. It's not new for our global machine learning models to recognize this activity as a bot, even when operators lie about their user agent. When bad actors attempt to crawl websites at scale, they generally use tools and frameworks that we’re able to fingerprint, and we use Cloudflare’s network of over 57 million requests per second on average, to understand how much we should trust the fingerprint. We compute global aggregates across many signals, and based on these signals, our models are able to consistently and <a href="https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/"><u>appropriately flag traffic from evasive AI bots</u></a>.</p><p>When we see a request from an AI bot, our system checks if we have previously identified ads in the response served by the target page. To do this, we inspect the “response body” — the raw HTML code of the web page being sent back.  After parsing the HTML document, we perform a comprehensive scan for code patterns commonly found in <a href="https://support.google.com/adsense/answer/9183549?hl=en#:~:text=An%20ad%20unit%20is%20one,flexibility%20in%20terms%20of%20customization."><u>ad units</u></a>, which signals to us that the page is serving an ad. Examples of such code would be:</p>
            <pre><code>&lt;div class="ui-advert" data-role="advert-unit" data-testid="advert-unit" data-ad-format="takeover" data-type="" data-label="" style=""&gt;
&lt;script&gt;
....
&lt;/script&gt;
&lt;/div&gt;</code></pre>
            <p>Here, the div-container has the <code>ui-advert</code> class commonly used for advertising. Similarly, links to commonly used ad servers like Google Syndication are a good signal as well, such as the following:</p>
            <pre><code>&lt;link rel="dns-prefetch" href="https://pagead2.googlesyndication.com/"&gt;

&lt;script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-1234567890123456" crossorigin="anonymous"&gt;&lt;/script&gt;</code></pre>
            <p>By streaming and directly parsing small chunks of the response using our ultra-fast <a href="https://blog.cloudflare.com/html-parsing-2/#lol-html"><u>LOL HTML parser</u></a>, we can perform scans without adding any latency to the inspected response.</p><p>So as not to reinvent the wheel, we are adopting techniques similar to those that ad blockers have been using for years. Ad blockers fundamentally perform two separate tasks to block advertisements in a browser. The first is to block the browser from fetching resources from ad servers, and the second is to suppress displaying HTML elements that contain ads. For this, ad blockers rely on large filter lists such as <a href="https://easylist.to/index.html"><u>EasyList</u></a> that contain both so-called URL block filters that match outgoing request URLs against a set of patterns, and block them if they match one of the filters, and CSS selectors that are designed to match HTML ad elements.</p><p>We can use both of these techniques to detect if an HTML response contains ads by checking external resources (e.g. content referenced by HREF or SCRIPT tags) against URL block filters, and the HTML elements themselves against CSS selectors. Because we do not actually need to block every single advertisement on a site, but rather detect the overall presence of ads on a site, we can achieve the same detection efficacy when shrinking the number of CSS and URL filters down from more than 40,000 in EasyList to the 400 most commonly seen ones to increase our computational efficiency.</p><p>Because some sites load ads dynamically rather than directly in the returned HTML (partially to avoid ad blocking), we enrich this first information source with data from <a href="https://developers.cloudflare.com/fundamentals/reference/policies-compliances/content-security-policies/"><u>Content Security Policy (CSP)</u></a> reports. The Content Security Policy standard is a security mechanism that helps web developers control the resources (like scripts, stylesheets, and images) a browser is allowed to load for a specific web page, and browsers send reports about loaded resources to a CSP management system, which for many sites is Cloudflare’s <a href="https://developers.cloudflare.com/page-shield/"><u>Page Shield</u></a> product. These reports allow us to relate scripts loaded from ad servers directly with page URLs. Both of these information sources are consumed by our <a href="https://www.cloudflare.com/en-gb/learning/security/glossary/what-is-endpoint/"><u>endpoint management service</u></a>, which then matches incoming requests against hostnames that we already know are serving ads.</p><p>We do all of this on every request for any customer who opts in, even free customers. </p><p>To enable this feature, simply navigate to the <a href="https://dash.cloudflare.com/?to=/:account/:zone/security/bots/configure"><u>Security &gt; Settings &gt; Bots</u></a> section of the Cloudflare dashboard, and choose either <code>Block on pages with Ads</code> or <code>Block Everywhere</code>.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/yoGKnsD7fuG9K8MysCMHl/91fb4bb69625d8c85a8dcf4cfb21f6de/unnamed__1_.png" />
          </figure>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/64xCpJrlgY1WtsNI0CeeT5/975e6a329b605e11445faafa038181aa/unnamed__2_.png" />
          </figure>
    <div>
      <h2>The AI bot hunt: finding and identifying bots</h2>
      <a href="#the-ai-bot-hunt-finding-and-identifying-bots">
        
      </a>
    </div>
    <p>The AI bot landscape has exploded and continues to grow with an exponential trajectory as more and more operators come online. At Cloudflare, our team of security researchers are constantly identifying and classifying different AI-related crawlers and scrapers across our network. </p><p>There are two major ways in which we track AI bots and identify those that are poorly behaved:</p><p>1. Our customers play a crucial role by directly submitting reports of misbehaved AI bots that may not yet be classified by Cloudflare. (If you have an AI bot that comes to mind here, we’d love for you to let us know through our <a href="https://docs.google.com/forms/d/14bX0RJH_0w17_cAUiihff5b3WLKzfieDO4upRlo5wj8/"><u>bots submission form</u></a> today.) Once such a bot comes to our attention, our security analysts investigate to determine how it should be categorized.</p><p>2. We’re able to derive insights through analysis of the massive scale of our customers’ traffic that we observe. Specifically, we can see which AI agents visit which websites and when, drawing out trends or patterns that might make a website owner want to disallow a given AI bot. This bird’s-eye view on abusive AI bot behavior was paramount as we started to determine the content of a managed robots.txt.</p>
    <div>
      <h2>What’s next?</h2>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>Our new <a href="https://developers.cloudflare.com/bots/additional-configurations/managed-robots-txt/"><u>managed robots.txt</u></a> and blocking AI bots on pages with ads features are available to <i>all Cloudflare customers</i>, including everyone on a Free plan. We encourage customers to start using them today – to take control over how the content on your website gets used. Looking ahead, Cloudflare will monitor the <a href="https://ietf-wg-aipref.github.io/drafts/draft-ietf-aipref-vocab.html"><u>IETF’s pending proposal</u></a> allowing website publishers to control how automated systems use their content and update our managed robots.txt accordingly. We will also continue to provide more granular control around AI bot management and investigate new distinguishing signals as AI bots become more and more precise. And if you’ve seen suspicious behavior from an AI scraper, contribute to the Internet ecosystem by <a href="https://docs.google.com/forms/d/14bX0RJH_0w17_cAUiihff5b3WLKzfieDO4upRlo5wj8/"><u>letting us know</u></a>!</p> ]]></content:encoded>
            <category><![CDATA[Pay Per Crawl]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[Impact]]></category>
            <guid isPermaLink="false">44HBJInoaQRMqVRmSaqjg6</guid>
            <dc:creator>Jin-Hee Lee</dc:creator>
            <dc:creator>Dipunj Gupta</dc:creator>
            <dc:creator>Brian Mitchell</dc:creator>
            <dc:creator>Reid Tatoris</dc:creator>
            <dc:creator>Henry Clausen</dc:creator>
        </item>
        <item>
            <title><![CDATA[API Endpoint Management and Metrics are now GA]]></title>
            <link>https://blog.cloudflare.com/api-management-metrics/</link>
            <pubDate>Thu, 22 Sep 2022 13:00:00 GMT</pubDate>
            <description><![CDATA[ API Shield customers can save, update, and monitor the performance of API endpoints ]]></description>
            <content:encoded><![CDATA[ <p></p><p>The Internet is an endless flow of conversations between computers. These conversations, the  constant exchange of information from one computer to another, are what allow us to interact with the Internet as we know it. <a href="https://www.cloudflare.com/learning/security/api/what-is-an-api/">Application Programming Interfaces (APIs)</a> are the vital channels that carry these conversations, and their usage is quickly growing: in fact, <a href="/landscape-of-api-traffic/">more than half of the traffic</a> handled by Cloudflare is for APIs, and this is increasing twice as fast as traditional web traffic.</p><p>In March, we announced that we’re expanding our API Shield into a full <a href="/api-gateway/">API Gateway</a> to make it easy for our customers to protect and manage those conversations. We already offer several features that allow you to secure your endpoints, but there’s more to endpoints than their security. It can be difficult to keep track of many endpoints over time and understand how they’re performing. Customers deserve to see what’s going on with their API-driven domains and have the ability to <i>manage</i> their endpoints.</p><p>Today, we’re excited to announce that the ability to save, update, and <a href="https://www.cloudflare.com/application-services/solutions/app-performance-monitoring/">monitor</a> the performance of all your API endpoints is now generally available to API Shield customers. This includes key performance metrics like latency, error rate, and response size that give you insights into the overall health of your API endpoints.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2l4QNVWkbubZU8K4rh35T0/d7d325b5626506b6de3871795aeb4945/image3-19.png" />
            
            </figure>
    <div>
      <h3>A Refresher on APIs</h3>
      <a href="#a-refresher-on-apis">
        
      </a>
    </div>
    <p>The bar for what we expect an application to do for us has risen tremendously over the past few years. When we open a browser, app, or IoT device, we expect to be able to connect to data instantly, compare dozens of flights within seconds, choose a menu item from a food delivery app, or see the weather for ten locations at once.</p><p>How are applications able to provide this kind of dynamic engagement for their users? They rely on APIs, which provide access to data and services—either from the application developer or from another company. APIs are fundamental in how computers (or services) talk to each other and exchange information.</p><p>You can think of an API as a waiter: say a customer orders a delicious bowl of Mac n Cheese. The waiter accepts this order from the customer, communicates the request to the chef in a format the chef can understand, and then delivers the Mac n Cheese back to the customer (assuming the chef has the ingredients in stock). The waiter is the crucial <i>channel of communication</i>, which is exactly what the API does.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4Tz8vFVlVKHWMjdXpCymyb/861e8f50df61cf032e2bf8b50012f069/image7-2.png" />
            
            </figure>
    <div>
      <h3>Managing API Endpoints</h3>
      <a href="#managing-api-endpoints">
        
      </a>
    </div>
    <p>The first step in managing APIs is to get a complete list of all the <a href="https://www.cloudflare.com/learning/learning/security/api/what-is-api-discovery/">endpoints exposed to the internet</a>. <a href="https://developers.cloudflare.com/api-shield/security/api-discovery/"><i>API Discovery</i></a> automatically does this for any traffic flowing through Cloudflare. Undiscovered APIs can’t be monitored by security teams (since they don't know about them) and they're thus less likely to have proper security policies and best practices applied. However, customers have told us they also want the ability to manually add and manage APIs that are not yet deployed, or they want to ignore certain endpoints (for example those in the process of deprecation). Now, API Shield customers can choose to save endpoints found by Discovery or manually add endpoints to API Shield.</p><p>But security vulnerabilities aren’t the only risk or area of concern with APIs – they can be painfully slow or connections can be unsuccessful. We heard questions from our customers such as: what are my most popular endpoints? Is this endpoint significantly slower than it was yesterday? Are any endpoints returning errors that may indicate a problem with the application?</p><p>That’s why we built Performance Metrics into API Shield, which allows our customers to quickly answer these questions themselves with real-time data.</p>
    <div>
      <h3>Prioritizing Performance</h3>
      <a href="#prioritizing-performance">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/gOOOfuquNBYu7jErTcASu/68d914e2fae4dc560d002944a21a54d2/image2-28.png" />
            
            </figure><p>Once you’ve discovered, saved, or removed endpoints, you want to know what’s going well and what’s not. To end-users, a huge part of what defines the experience as “going well” is good performance. Poor performance can lead to a frustrating experience: when you’re shopping online and press a button to check out, you don’t want to wait around for minutes for the page to load. And you certainly never want to see a dreaded error symbol telling you that you can’t get what you came for.</p><p>Exposing performance metrics of API endpoints puts concrete numerical data into your developers’ hands to tell you how things are going. When things are going poorly, these dashboard metrics will point out exactly which aspect of performance is causing concern: maybe you expected to see a spike in requests, but find out that request count is normal and latency is just higher than usual.</p><p>Empowering our customers to make data-driven decisions to better manage their APIs ends up being a win for our customers <i>and</i> our customers’ customers, who expect to seamlessly engage with the domain’s APIs and get exactly what they came for.</p>
    <div>
      <h3>Management and Performance Metrics in the Dashboard</h3>
      <a href="#management-and-performance-metrics-in-the-dashboard">
        
      </a>
    </div>
    <p>So, what’s available today? Log onto your Cloudflare dashboard, go to the domain-level Security tab, and open up the API Shield page. Here, you’ll see the Endpoint Management tab, which shows you all the API endpoints that you’ve saved, alongside placeholders for metrics that will soon be gathered.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/11jqaqDrMTvF4L7rgtydlY/2e403ac4c5ba5d1f028cb390621f81a3/image10-1.png" />
            
            </figure><p>Here you can easily delete endpoints you no longer want to track, or click manually add additional endpoints. You can also export schemas for each host to share internally or externally.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6Ka18IINPtkTxeQjGIvoN6/062065fe986c6be91ce5f082b1393932/image9-3.png" />
            
            </figure><p>Once you’ve saved the endpoints that you want to keep tabs on, Cloudflare will start collecting data on its performance and make it available to you as soon as possible.</p><p>In Endpoint Management, you can see a few summary metrics in the collapsed view of each endpoint, including recommended rate limits, average latency, and error rate. It can be difficult to tell whether things are going well or not just from seeing a value alone, so we added sparklines that show <i>relative</i> performance, comparing an endpoint’s current metrics with its usual or previous data.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3DIMD3RnXplhBT7ghBPbbk/7db67978569894327e29dadd883e475d/image5-10.png" />
            
            </figure><p>If you want to view further details about a given endpoint, you can expand it for additional metrics such as response size and errors separated by 4xx and 5xx. The expanded view also allows you to view all metrics at a single timestamp by hovering over the charts.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5dyhRVO579d42OVBOlptj5/cd53f412261a43704ca07fb8fdf799ea/unnamed-1.png" />
            
            </figure><p>For each saved endpoint, customers can see the following metrics:</p><ul><li><p><b>Request count</b>: total number of requests to the endpoint over time.</p></li><li><p><b>Rate limiting recommendation</b> per 10 minutes, which is guided by the request count.</p></li><li><p><b>Latency</b>: average origin response time, in milliseconds (ms). How long does it take from the moment a visitor makes a request to the moment the visitor gets a response back from the origin?</p></li><li><p><b>Error rate</b> vs. overall traffic: grouped by 4xx, 5xx, and their sum.</p></li><li><p><b>Response size</b>: average size of the response (in bytes) returned to the request.</p></li></ul><p>You can toggle between viewing these metrics on a 24-hour period or a 7-day period, depending on the scale on which you’d like to view your data. And in the expanded view, we provide a percentage difference between the averages of the current vs. the previous period. For example, say I’m viewing my metrics on a 24-hour timeline. My average latency yesterday was 10 ms, and my average latency today is 30 ms, so the dashboard shows a 200% increase. We also use anomaly detection to bring attention to endpoints that have concerning performance changes.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7sptSfdramyWzuaOyJqaD1/f3c7f8811c0cf43bd4bf17fbbdeba45a/image6-5.png" />
            
            </figure>
    <div>
      <h3>Additional improvements to Discovery and Schema Validation</h3>
      <a href="#additional-improvements-to-discovery-and-schema-validation">
        
      </a>
    </div>
    <p>As part of making endpoint management GA, we’re also adding two additional enhancements to API Shield.</p><p>First, <b><i>API Discovery now accepts cookies</i></b> — in addition to authorization headers — to discover endpoints and suggest rate limiting thresholds. Previously, you could only identify an API session with HTTP headers, which didn’t allow customers to protect endpoints that use cookies as session identifiers. Now these endpoints can be protected as well. Simply go to the API Shield tab in the dashboard, choose edit session identifiers, and either change the type, or click <code>Add additional identifier</code>.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1n4RtAi8Z5BPmkZDCZAqt3/6e33650b226257918ee927b30b1ee371/image1-34.png" />
            
            </figure><p>Second, we added the ability to <b><i>validate the body of requests via Schema Validation</i></b> for all customers. Schema Validation allows you to provide an OpenAPI schema (a template for your API traffic) and have Cloudflare block non-conformant requests as they arrive at our edge. Previously, you provided specific headers, cookies, and other features to validate. Now that we can validate the body of requests, you can use Schema Validation to confirm every element of a request matches what is expected. If a request contains strange information in the payload, we’ll notice. Note: <i>customers who have already uploaded schemas will need to re-upload to take advantage of body validation.</i></p><p>Take a look at our <a href="https://developers.cloudflare.com/api-shield/">developer documentation</a> for more details on both of these features.</p>
    <div>
      <h3>Get started</h3>
      <a href="#get-started">
        
      </a>
    </div>
    <p>Endpoint Management, performance metrics, schema exporting, discovery via cookies, and schema body validation are all available now for all API Shield customers. To use them, log into the Cloudflare dashboard, click on <code>Security</code> in the navigation bar, and choose API Shield. Once API Shield is enabled, you’ll be able to start discovering endpoints immediately. You can also use all features <a href="https://api.cloudflare.com/#api-shield-settings-properties">through our API</a>.</p><p>If you aren’t yet protecting a website with Cloudflare, it only takes a few minutes to <a href="https://dash.cloudflare.com/sign-up">sign up</a>.</p> ]]></content:encoded>
            <category><![CDATA[GA Week]]></category>
            <category><![CDATA[General Availability]]></category>
            <category><![CDATA[API]]></category>
            <category><![CDATA[API Gateway]]></category>
            <category><![CDATA[API Security]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[API Shield]]></category>
            <guid isPermaLink="false">7Cxx3FtFMuLgbsPe1lYNge</guid>
            <dc:creator>Jin-Hee Lee</dc:creator>
        </item>
    </channel>
</rss>