
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Sat, 11 Apr 2026 01:42:32 GMT</lastBuildDate>
        <item>
            <title><![CDATA[Security Week 2025: in review]]></title>
            <link>https://blog.cloudflare.com/security-week-2025-wrap-up/</link>
            <pubDate>Mon, 24 Mar 2025 13:05:00 GMT</pubDate>
            <description><![CDATA[ Security Week 2025 has officially come to a close. Our updates for the week included a deep dive on our AI offering, a unified navigation experience, and an introduction to our AI Agent Cloudy. ]]></description>
            <content:encoded><![CDATA[ <p>Thank you for following along with another Security Week at Cloudflare. We’re extremely proud of the work our team does to make the Internet safer and to help meet the challenge of emerging threats. As our CISO Grant Bourzikas outlined in his <a href="https://blog.cloudflare.com/welcome-to-security-week-2025/"><u>kickoff post</u></a> this week, security teams are facing a landscape of rapidly increasing complexity introduced by vendor sprawl, an “AI Boom”, and an ever-growing surface area to protect.</p><p>As we continuously work to meet new challenges, Innovation Weeks like Security Week give us an invaluable opportunity to share our point of view and engage with the wider Internet community. Cloudflare’s mission is to <i>help</i> build a better Internet. We want to help safeguard the Internet from the arrival of quantum supercomputers, help protect the livelihood of content creators from <a href="https://www.cloudflare.com/learning/ai/how-to-prevent-web-scraping/">unauthorized AI scraping</a>, help raise awareness of the latest Internet threats, and help find new ways to help reduce the reuse of compromised passwords. Solving these challenges will take a village. We’re grateful to everyone who has engaged with us on these issues via social media, contributed to <a href="https://github.com/cloudflare"><u>our open source repositories</u></a>, and reached out through our <a href="https://www.cloudflare.com/partners/technology-partners/"><u>technology partner program</u></a> to work with us on the issues most important to them. For us, that’s the best part.</p><p>Here’s a recap of this week’s announcements:</p>
    <div>
      <h3>Helping make the Internet safer</h3>
      <a href="#helping-make-the-internet-safer">
        
      </a>
    </div>
    <table><tr><td><p><b>Title</b></p></td><td><p><b>Excerpt</b></p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/post-quantum-zero-trust/"><u>Conventional cryptography is under threat. Upgrade to post-quantum cryptography with Cloudflare Zero Trust</u></a></p></td><td><p>We’re thrilled to announce that organizations can now protect their sensitive corporate network traffic against quantum threats by tunneling it through Cloudflare’s Zero Trust platform.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/how-cloudflare-is-using-automation-to-tackle-phishing/"><u>How Cloudflare is using automation to tackle phishing head on</u></a></p></td><td><p>How Cloudflare is using threat intelligence and our Developer Platform products to automate phishing abuse reports.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/advancing-account-security-as-part-of-cloudflare-commitment-to-cisa-secure-by-design-pledge/"><u>Advancing account security as part of Cloudflare’s commitment to CISA’s Secure by Design pledge</u></a></p></td><td><p>Cloudflare has made significant progress in boosting multi-factor authentication (MFA) adoption. With the addition of Apple and Google social logins, we’ve made secure access easier for our users.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/email-security-now-available-for-free-for-political-parties-and-campaigns/"><u>Email Security now available for free for political parties and campaigns through Cloudflare for Campaigns</u></a></p></td><td><p>We’re excited to announce that Cloudflare for Campaigns now includes Email Security, adding an extra layer of protection to email systems that power political campaigns.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/enhanced-security-and-simplified-controls-with-automated-botnet-protection/"><u>Enhanced security and simplified controls with automated botnet protection, cipher suite selection, and URL Scanner updates</u></a></p></td><td><p>Enhanced security, simplified control! This Security Week, Cloudflare unveils automated botnet protection, flexible cipher suites, and an upgraded URL Scanner.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/password-reuse-rampant-half-user-logins-compromised/"><u>Password reuse is rampant: nearly half of user logins are compromised</u></a></p></td><td><p>Nearly half of login attempts across websites protected by Cloudflare involved leaked credentials. The pervasive issue of password reuse is enabling automated bot attacks on a massive scale.</p></td></tr></table>
    <div>
      <h3>Threat research from the network that sees the most threats </h3>
      <a href="#threat-research-from-the-network-that-sees-the-most-threats">
        
      </a>
    </div>
    <table><tr><td><p><b>Title</b></p></td><td><p><b>Excerpt</b></p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/threat-events-platform/"><u>Unleashing improved context for threat actor activity with our Cloudforce One threat events platform</u></a></p></td><td><p>Gain real-time insights with our new threat events platform. This tool empowers your cybersecurity defense with actionable intelligence to stay ahead of attacks and protect your critical assets.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/cloudflare-security-posture-management/"><u>One platform to manage your company’s predictive security posture with Cloudflare</u></a></p></td><td><p>Cloudflare introduces a single platform for unified security posture management, helping protect SaaS and web applications deployed across various environments.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/monitoring-and-forensics/"><u>Cloudflare enables native monitoring and forensics with Log Explorer and custom dashboards</u></a></p></td><td><p>We are excited to announce support for Zero Trust datasets, and custom dashboards where customers can monitor critical metrics for suspicious or unusual activity</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/upgraded-turnstile-analytics-enable-deeper-insights-faster-investigations/"><u>Introducing new Turnstile Analytics: Gain insight into your visitor traffic, bot behavior patterns, traffic anomalies, and attack attributes.</u></a></p></td><td><p>Introducing new Turnstile Analytics: gain insight into your visitor traffic, bot behavior patterns, traffic anomalies, and attack attributes.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/cloudflare-radar-ddos-leaked-credentials-bots/"><u>Extending Cloudflare Radar’s security insights with new DDoS, leaked credentials, and bots datasets</u></a></p></td><td><p>For Security Week 2025, we are adding several new DDoS-focused graphs, new insights into leaked credential trends, and a new Bots page to Cloudflare Radar.</p></td></tr></table>
    <div>
      <h3>Securing models and guarding against AI threats </h3>
      <a href="#securing-models-and-guarding-against-ai-threats">
        
      </a>
    </div>
    <table><tr><td><p><b>Title</b></p></td><td><p><b>Excerpt</b></p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/cloudflare-for-ai-supporting-ai-adoption-at-scale-with-a-security-first-approach/"><u>Cloudflare for AI: supporting AI adoption at scale with a security-first approach</u></a></p></td><td><p>With Cloudflare for AI, developers, security teams, and content creators can leverage Cloudflare’s network and portfolio of tools to secure, observe, and make AI applications resilient and safe to use.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/how-we-train-ai-to-uncover-malicious-javascript-intent-and-make-web-surfing-safer/"><u>How we train AI to uncover malicious JavaScript intent and make web surfing safer</u></a></p></td><td><p>Learn more about how Cloudflare developed an AI model to uncover malicious JavaScript intent using a Graph Neural Network, from pre-processing data to inferencing at scale.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/an-early-look-at-cryptographic-watermarks-for-ai-generated-content/"><u>An early look at cryptographic watermarks for AI-generated content</u></a></p></td><td><p>It's hard to tell the difference between web content produced by humans and web content produced by AI. We're taking a new approach to making AI content distinguishable without impacting performance.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/ai-labyrinth/"><u>How Cloudflare uses generative AI to slow down, confuse, and waste the resources of AI Crawlers and other bots that don’t respect “no crawl” directives.</u></a></p></td><td><p>How Cloudflare uses generative AI to slow down, confuse, and waste the resources of AI Crawlers and other bots that don’t respect “no crawl” directives.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/take-control-of-public-ai-application-security-with-cloudflare-firewall-for-ai/"><u>Take control of public AI application security with Cloudflare's Firewall for AI</u></a></p></td><td><p>Firewall for AI discovers and protects your public LLM-powered applications, and is seamlessly integrated with Cloudflare WAF. Join the beta now and take control of your generative AI security</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/bots-heuristics/"><u>Improved Bot Management flexibility and visibility with new high-precision heuristics</u></a></p></td><td><p>By building and integrating a new heuristics framework into the Cloudflare Ruleset Engine, we now have a more flexible system to write rules and deploy new releases rapidly</p></td></tr></table>
    <div>
      <h3>Simplifying security</h3>
      <a href="#simplifying-security">
        
      </a>
    </div>
    <table><tr><td><p><b>Title</b></p></td><td><p><b>Excerpt</b></p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/introducing-ai-agent/"><u>Introducing Cloudy, Cloudflare’s AI agent for simplifying complex configurations</u></a></p></td><td><p>Cloudflare’s first AI agent, Cloudy, helps make complicated configurations easy to understand for Cloudflare administrators.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/new-application-security-experience/"><u>Making Application Security simple with a new unified dashboard experience</u></a></p></td><td><p>We’re introducing a new Application Security experience in the Cloudflare dashboard, with a reworked UI organized by use cases, making it easier for customers to navigate and secure their accounts</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/improved-support-for-private-applications-and-reusable-access-policies-with-cloudflare-access/"><u>Improved support for private applications and reusable access policies with Cloudflare Access</u></a></p></td><td><p>We are excited to introduce support for private hostname and IP address-defined applications as well as reusable access policies.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/aegis-deep-dive/"><u>Simplify allowlist management and lock down origin access with Cloudflare Aegis</u></a></p></td><td><p>Cloudflare Aegis provides dedicated egress IPs for Zero Trust origin access strategies, now supporting BYOIP and customer-facing configurability, with observability of Aegis IP address utilization coming soon.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/https-only-for-cloudflare-apis-shutting-the-door-on-cleartext-traffic"><u>HTTPS-only for Cloudflare APIs: shutting the door on cleartext traffic</u></a></p></td><td><p>We are closing the cleartext HTTP ports entirely for Cloudflare API traffic. This prevents the risk of clients unintentionally leaking their secret API keys in cleartext during the initial request, before we can reject the connection at the server side.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/cloudflare-named-leader-waf-forrester-2025/"><u>Cloudflare named a leader in Web Application Firewall Solutions in 2025 Forrester report</u></a></p></td><td><p>Forrester Research has recognized Cloudflare as a Leader in its The Forrester Wave™: Web Application Firewall Solutions, Q1 2025 report.</p></td></tr></table>
    <div>
      <h3>Data security everywhere, all the time </h3>
      <a href="#data-security-everywhere-all-the-time">
        
      </a>
    </div>
    <table><tr><td><p><b>Title</b></p></td><td><p><b>Excerpt</b></p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/scan-cloud-dlp-with-casb"><u>Detecting sensitive data and misconfigurations in AWS and GCP with Cloudflare One</u></a></p></td><td><p>Using Cloudflare’s CASB, integrate, scan, and detect sensitive data and misconfigurations in your cloud storage accounts.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/browser-based-rdp"><u>RDP without the risk: Cloudflare's browser-based solution for secure third-party access</u></a></p></td><td><p>Cloudflare now provides clientless, browser-based support for the Remote Desktop Protocol (RDP). It natively enables secure, remote Windows server access without VPNs or RDP clients, to support third-party access and BYOD security.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/improving-data-loss-prevention-accuracy-with-ai-context-analysis"><u>Improving Data Loss Prevention accuracy with AI-powered context analysis</u></a></p></td><td><p>Cloudflare’s Data Loss Prevention is reducing false positives by using a self-improving AI-powered algorithm, built on Cloudflare’s Developer Platform, to improve detection accuracy through AI context analysis.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/enhance-data-protection-in-microsoft-outlook-with-cloudflare-ones-new-dlp"><u>Enhance data protection in Microsoft Outlook with Cloudflare One’s new DLP Assist</u></a></p></td><td><p>Customers can now easily safeguard sensitive data in Microsoft Outlook with our new DLP Assist feature.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/lattice-crypto-primer"><u>Prepping for post-quantum: a beginner’s guide to lattice cryptography</u></a></p></td><td><p>This post is a beginner's guide to lattices, the math at the heart of the transition to post-quantum (PQ) cryptography. It explains how to do lattice-based encryption and authentication from scratch.</p></td></tr><tr><td><p><a href="https://blog.cloudflare.com/irap-protected-assessment"><u>Cloudflare is now IRAP assessed at the PROTECTED level, furthering our commitment to the global public sector</u></a></p></td><td><p>Cloudflare is now assessed at the IRAP PROTECTED level, bringing our products and services to the Australian Public Sector.</p></td></tr></table>
    <div>
      <h2>Tune in to the latest on Cloudflare TV</h2>
      <a href="#tune-in-to-the-latest-on-cloudflare-tv">
        
      </a>
    </div>
    <p>For a deeper dive on many of the great announcements from Security Week, <a href="https://www.cloudflare.com/security-week-2025/cloudflare-tv/"><u>check out our CFTV segments</u></a> where our team shares even more details on our latest updates. </p><div>
  
</div>
<p></p>
    <div>
      <h2>See you for our next Innovation Week</h2>
      <a href="#see-you-for-our-next-innovation-week">
        
      </a>
    </div>
    <p>We appreciate everyone who’s taken the time to read Cloudflare’s Security Week blog posts or engage with us on these topics via social media. Our next innovation week, <a href="https://www.cloudflare.com/developer-week-2024/updates/"><u>Developer Week</u></a>, is right around the corner in April. We look forward to seeing you then!</p> ]]></content:encoded>
            <category><![CDATA[Security Week]]></category>
            <category><![CDATA[Application Security]]></category>
            <category><![CDATA[Zero Trust]]></category>
            <category><![CDATA[Email Security]]></category>
            <category><![CDATA[AI]]></category>
            <guid isPermaLink="false">57bTt5UYhnjdF2MwuEePqb</guid>
            <dc:creator>Kim Blight</dc:creator>
            <dc:creator>Adam Martinetti</dc:creator>
            <dc:creator>Alex Dunbrack</dc:creator>
        </item>
        <item>
            <title><![CDATA[Improved Bot Management flexibility and visibility with new high-precision heuristics]]></title>
            <link>https://blog.cloudflare.com/bots-heuristics/</link>
            <pubDate>Wed, 19 Mar 2025 13:00:00 GMT</pubDate>
            <description><![CDATA[ By building and integrating a new heuristics framework into the Cloudflare Ruleset Engine, we now have a more flexible system to write rules and deploy new releases rapidly. ]]></description>
            <content:encoded><![CDATA[ <p>Within the Cloudflare Application Security team, every <a href="https://www.cloudflare.com/learning/ai/what-is-machine-learning/"><u>machine learning</u></a> model we use is underpinned by a rich set of static rules that serve as a ground truth and a baseline comparison for how our models are performing. These are called heuristics. Our Bot Management heuristics engine has served as an important part of eight global <a href="https://developers.cloudflare.com/bots/concepts/bot-score/#machine-learning"><u>machine learning (ML) models</u></a>, but we needed a more expressive engine to increase our accuracy. In this post, we’ll review how we solved this by moving our heuristics to the Cloudflare <a href="https://developers.cloudflare.com/ruleset-engine/"><u>Ruleset Engine</u></a>. Not only did this provide the platform we needed to write more nuanced rules, it made our platform simpler and safer, and provided <a href="https://www.cloudflare.com/application-services/products/bot-management/"><u>Bot Management</u></a> customers more flexibility and visibility into their bot traffic.   </p>
    <div>
      <h3>Bot detection via simple heuristics</h3>
      <a href="#bot-detection-via-simple-heuristics">
        
      </a>
    </div>
    <p>In Cloudflare’s bot detection, we build heuristics from attributes like software library fingerprints, HTTP request characteristics, and internal threat intelligence. Heuristics serve three separate purposes for bot detection: </p><ol><li><p>Bot identification: If traffic matches a heuristic, we can identify the traffic as definitely automated traffic (with a <a href="https://developers.cloudflare.com/bots/concepts/bot-score/"><u>bot score</u></a> of 1) without the need of a machine learning model. </p></li><li><p>Train ML models: When traffic matches our heuristics, we create labelled datasets of bot traffic to train new models. We’ll use many different sources of labelled bot traffic to train a new model, but our heuristics datasets are one of the highest confidence datasets available to us.   </p></li><li><p>Validate models: We benchmark any new model candidate’s performance against our heuristic detections (among many other checks) to make sure it meets a required level of accuracy.</p></li></ol><p>While the existing heuristics engine has worked very well for us, as bots evolved we needed the flexibility to write increasingly complex rules. Unfortunately, such rules were not easily supported in the old engine. Customers have also been asking for more details about which specific heuristic caught a request, and for the flexibility to enforce different policies per heuristic ID.  We found that by building a new heuristics framework integrated into the Cloudflare Ruleset Engine, we could build a more flexible system to write rules and give Bot Management customers the granular explainability and control they were asking for. </p>
    <div>
      <h3>The need for more efficient, precise rules</h3>
      <a href="#the-need-for-more-efficient-precise-rules">
        
      </a>
    </div>
    <p>In our previous heuristics engine, we wrote rules in <a href="https://www.lua.org/"><u>Lua</u></a> as part of our <a href="https://openresty.org/"><u>openresty</u></a>-based reverse proxy. The Lua-based engine was limited to a very small number of characteristics in a rule because of the high engineering cost we observed with adding more complexity.</p><p>With Lua, we would write fairly simple logic to match on specific characteristics of a request (i.e. user agent). Creating new heuristics of an existing class was fairly straight forward. All we’d need to do is define another instance of the existing class in our database. However, if we observed malicious traffic that required more than two characteristics (as a simple example, user-agent and <a href="https://en.wikipedia.org/wiki/Autonomous_system_(Internet)"><u>ASN</u></a>) to identify, we’d need to create bespoke logic for detections. Because our Lua heuristics engine was bundled with the code that ran ML models and other important logic, all changes had to go through the same review and release process. If we identified malicious traffic that needed a new heuristic class, and we were also blocked by pending changes in the codebase, we’d be forced to either wait or rollback the changes. If we’re writing a new rule for an “under attack” scenario, every extra minute it takes to deploy a new rule can mean an unacceptable impact to our customer’s business. </p><p>More critical than time to deploy is the complexity that the heuristics engine supports. The old heuristics engine only supported using specific request attributes when creating a new rule. As bots became more sophisticated, we found we had to reject an increasing number of new heuristic candidates because we weren’t able to write precise enough rules. For example, we found a <a href="https://go.dev/"><u>Golang</u></a> TLS fingerprint frequently used by bots and by a small number of corporate VPNs. We couldn’t block the bots without also stopping the legitimate VPN usage as well, because the old heuristics platform lacked the flexibility to quickly compile sufficiently nuanced rules. Luckily, we already had the perfect solution with Cloudflare Ruleset Engine. </p>
    <div>
      <h3>Our new heuristics engine</h3>
      <a href="#our-new-heuristics-engine">
        
      </a>
    </div>
    <p>The Ruleset Engine is familiar to anyone who has written a WAF rule, Load Balancing rule, or Transform rule, <a href="https://blog.cloudflare.com/announcing-firewall-rules/"><u>just to name a few</u></a>. For Bot Management, the Wireshark-inspired syntax allows us to quickly write heuristics with much greater flexibility to vastly improve accuracy. We can write a rule in <a href="https://yaml.org/"><u>YAML</u></a> that includes arbitrary sub-conditions and inherit the same framework the WAF team uses to both ensure any new rule undergoes a rigorous testing process with the ability to rapidly release new rules to stop attacks in real-time. </p><p>Writing heuristics on the Cloudflare Ruleset Engine allows our engineers and analysts to write new rules in an easy to understand YAML syntax. This is critical to supporting a rapid response in under attack scenarios, especially as we support greater rule complexity. Here’s a simple rule using the new engine, to detect empty user-agents restricted to a specific JA4 fingerprint (right), compared to the empty user-agent detection in the old Lua based system (left): </p><table><tr><td><p><b>Old</b></p></td><td><p><b>New</b></p></td></tr><tr><td><p><code>local _M = {}</code></p><p><code>local EmptyUserAgentHeuristic = {</code></p><p><code>   heuristic = {},</code></p><p><code>}</code></p><p><code>EmptyUserAgentHeuristic.__index = EmptyUserAgentHeuristic</code></p><p><code>--- Creates and returns empty user agent heuristic</code></p><p><code>-- @param params table contains parameters injected into EmptyUserAgentHeuristic</code></p><p><code>-- @return EmptyUserAgentHeuristic table</code></p><p><code>function _M.new(params)</code></p><p><code>   return setmetatable(params, EmptyUserAgentHeuristic)</code></p><p><code>end</code></p><p><code>--- Adds heuristic to be used for inference in `detect` method</code></p><p><code>-- @param heuristic schema.Heuristic table</code></p><p><code>function EmptyUserAgentHeuristic:add(heuristic)</code></p><p><code>   self.heuristic = heuristic</code></p><p><code>end</code></p><p><code>--- Detect runs empty user agent heuristic detection</code></p><p><code>-- @param ctx context of request</code></p><p><code>-- @return schema.Heuristic table on successful detection or nil otherwise</code></p><p><code>function EmptyUserAgentHeuristic:detect(ctx)</code></p><p><code>   local ua = ctx.user_agent</code></p><p><code>   if not ua or ua == '' then</code></p><p><code>      return self.heuristic</code></p><p><code>   end</code></p><p><code>end</code></p><p><code>return _M</code></p></td><td><p><code>ref: empty-user-agent</code></p><p><code>      description: Empty or missing </code></p><p><code>User-Agent header</code></p><p><code>      action: add_bot_detection</code></p><p><code>      action_parameters:</code></p><p><code>        active_mode: false</code></p><p><code>      expression: http.user_agent eq </code></p><p><code>"" and cf.bot_management.ja4 = "t13d1516h2_8daaf6152771_b186095e22b6"</code></p></td></tr></table><p>The Golang heuristic that captured corporate proxy traffic as well (mentioned above) was one of the first to migrate to the new Ruleset engine. Before the migration, traffic matching on this heuristic had a false positive rate of 0.01%. While that sounds like a very small number, this means for every million bots we block, 100 real users saw a Cloudflare challenge page unnecessarily. At Cloudflare scale, even small issues can have real, negative impact.</p><p>When we analyzed the traffic caught by this heuristic rule in depth, we saw the vast majority of attack traffic came from a small number of abusive networks. After narrowing the definition of the heuristic to flag the Golang fingerprint only when it’s sourced by the abusive networks, the rule now has a false positive rate of 0.0001% (One out of 1 million).  Updating the heuristic to include the network context improved our accuracy, while still blocking millions of bots every week and giving us plenty of training data for our bot detection models. Because this heuristic is now more accurate, newer ML models make more accurate decisions on what’s a bot and what isn’t.</p>
    <div>
      <h3>New visibility and flexibility for Bot Management customers </h3>
      <a href="#new-visibility-and-flexibility-for-bot-management-customers">
        
      </a>
    </div>
    <p>While the new heuristics engine provides more accurate detections for all customers and a better experience for our analysts, moving to the Cloudflare Ruleset Engine also allows us to deliver new functionality for Enterprise Bot Management customers, specifically by offering more visibility. This new visibility is via a new field for Bot Management customers called Bot Detection IDs. Every heuristic we use includes a unique Bot Detection ID. These are visible to Bot Management customers in analytics, logs, and firewall events, and they can be used in the firewall to write precise rules for individual bots. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3cYXHw8tFUjdJQGm93gFcE/0a3f6ab89a70410ebb7dd2c6f4f3a677/1.png" />
          </figure>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/9d7L1r7yN9AEhO26H7SgQ/434f0f934dd4f4498a8d13e85a7660ae/2.png" />
          </figure><p>Detections also include a specific tag describing the class of heuristic. Customers see these plotted over time in their analytics. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2UlkGQ070UWsmU76IXYqDd/6bca8f28959a8fe8f3013792a17b348a/image4.png" />
          </figure><p>To illustrate how this data can help give customers visibility into why we blocked a request, here’s an example request flagged by Bot Management (with the IP address, ASN, and country changed):</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3i6k9ejsBVwlbXUrZFIFJG/4c9cddd11d9f7a8ddf10eb5ff30a027b/4.png" />
          </figure><p>Before, just seeing that our heuristics gave the request a score of 1 was not very helpful in understanding why it was flagged as a bot. Adding our Detection IDs to Firewall Events helps to paint a better picture for customers that we’ve identified this request as a bot because that traffic used an empty user-agent. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5UQd20VnXCnIav1skHiX6i/18e0cf01106601ae7faf18be50d43365/5.png" />
          </figure><p>In addition to Analytics and Firewall Events, Bot Detection IDs are now available for Bot Management customers to use in Custom Rules, Rate Limiting Rules, Transform Rules, and Workers. </p>
    <div>
      <h4>Account takeover detection IDs</h4>
      <a href="#account-takeover-detection-ids">
        
      </a>
    </div>
    <p>One way we’re focused on improving Bot Management for our customers is by surfacing more attack-specific detections. During Birthday Week, we <a href="https://blog.cloudflare.com/a-safer-internet-with-cloudflare/"><u>launched Leaked Credentials Check</u></a> for all customers so that security teams could help <a href="https://www.cloudflare.com/zero-trust/solutions/account-takeover-prevention/"><u>prevent account takeover (ATO) attacks</u></a> by identifying accounts at risk due to leaked credentials. We’ve now added two more detections that can help Bot Management enterprise customers identify suspicious login activity via specific <a href="https://developers.cloudflare.com/bots/concepts/detection-ids/#account-takeover-detections"><u>detection IDs</u></a> that monitor login attempts and failures on the zone. These detection IDs are not currently affecting the bot score, but will begin to later in 2025. Already, they can help many customers detect more account takeover events now.</p><p>Detection ID <b>201326592 </b>monitors traffic on a customer website and looks for an anomalous rise in login failures (usually associated with brute force attacks), and ID <b>201326593 </b>looks for an anomalous rise in login attempts (usually associated with credential stuffing). </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4a2LGgB1bXGwFgHQEKFSI9/ff4194a6a470e466274f7b7c51e9dc18/6.png" />
          </figure>
    <div>
      <h3>Protect your applications</h3>
      <a href="#protect-your-applications">
        
      </a>
    </div>
    <p>If you are a Bot Management customer, log in and head over to the Cloudflare dashboard and take a look in Security Analytics for bot detection IDs <code>201326592</code> and <code>201326593</code>.</p><p>These will highlight ATO attempts targeting your site. If you spot anything suspicious, or would like to be protected against future attacks, create a rule that uses these detections to keep your application safe.</p> ]]></content:encoded>
            <category><![CDATA[Security Week]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[Bot Management]]></category>
            <category><![CDATA[Application Security]]></category>
            <category><![CDATA[Machine Learning]]></category>
            <category><![CDATA[Heuristics]]></category>
            <category><![CDATA[Edge Rules]]></category>
            <category><![CDATA[Security]]></category>
            <guid isPermaLink="false">4IkgXzyemEEsN7A6Cd18hb</guid>
            <dc:creator>Curtis Lowder</dc:creator>
            <dc:creator>Brian Mitchell</dc:creator>
            <dc:creator>Adam Martinetti</dc:creator>
        </item>
        <item>
            <title><![CDATA[Grinch Bots strike again: defending your holidays from cyber threats]]></title>
            <link>https://blog.cloudflare.com/grinch-bot-2024/</link>
            <pubDate>Mon, 23 Dec 2024 14:01:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare observed a 4x increase in bot-related traffic on Black Friday in 2024. 29% of all traffic on our network on Black Friday was Grinch Bots wreaking holiday havoc. ]]></description>
            <content:encoded><![CDATA[ 
    <div>
      <h2>Grinch Bots are still stealing Christmas</h2>
      <a href="#grinch-bots-are-still-stealing-christmas">
        
      </a>
    </div>
    <p>Back in 2021, we covered the antics of <a href="https://blog.cloudflare.com/grinch-bot/"><u>Grinch Bots</u></a> and how the combination of proposed regulation and technology could prevent these malicious programs from stealing holiday cheer.</p><p>Fast-forward to 2024 — the <a href="https://www.congress.gov/bill/117th-congress/senate-bill/3276/all-info#:~:text=%2F30%2F2021)-,Stopping%20Grinch%20Bots%20Act%20of%202021,or%20services%20in%20interstate%20commerce"><u>Stop Grinch Bots Act of 2021</u></a> has not passed, and bots are more active and powerful than ever, leaving businesses to fend off increasingly sophisticated attacks on their own. During Black Friday 2024, Cloudflare observed:</p><ul><li><p><b>29% of all traffic on Black Friday was Grinch Bots</b>. Humans still accounted for the majority of all traffic, but bot traffic was up 4x from three years ago in absolute terms. </p></li><li><p><b>1% of traffic on Black Friday came from AI bots. </b>The majority of it came from Claude, Meta, and Amazon. 71% of this traffic was given the green light to access the content requested. </p></li><li><p><b>63% of login attempts across our network on Black Friday were from bots</b>. While this number is high, it was down a few percentage points compared to a month prior, indicating that more humans accessed their accounts and holiday deals. </p></li><li><p><b>Human logins on e-commerce sites increased 7-8%</b> compared to the previous month. </p></li></ul><p>These days, holiday shopping doesn’t start on Black Friday and stop on Cyber Monday. Instead, it stretches through Cyber Week and beyond, including flash sales, pre-orders, and various other promotions. While this provides consumers more opportunities to shop, it also creates more openings for Grinch Bots to wreak havoc.</p>
    <div>
      <h2>Black Friday - Cyber Monday by the numbers</h2>
      <a href="#black-friday-cyber-monday-by-the-numbers">
        
      </a>
    </div>
    <p><a href="https://blog.cloudflare.com/the-truth-about-black-friday-and-cyber-monday/">Black Friday and Cyber Monday</a> in 2024 brought record-breaking shopping — and grinching. In addition to looking across our entire network, we also analyzed traffic patterns specifically on a cohort of e-commerce sites. </p><p>Legitimate shoppers flocked to e-commerce sites, with requests reaching an astounding 405 billion on Black Friday, accounting for 81% of the day’s total traffic to e-commerce sites. Retailers reaped the rewards of their deals and advertising, seeing a 50% surge in shoppers week-over-week and a 61% increase compared to the previous month.</p><p>Unfortunately, Grinch Bots were equally active. Total e-commerce bot activity surged to 103 billion requests, representing up to 19% of all traffic to e-commerce sites. Nearly one in every five requests to an online store was not a real customer. That’s a lot of resources to waste on bogus traffic. Cyber Week was a battleground, with bots hoarding inventory, exploiting deals, and disrupting genuine shopping experiences.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6XOfNPFabXiMzIrZrXGES2/9fa1eb6d1ca558ef9f405826999bcea2/image9.png" />
          </figure><p>The upside, if there is one, is that there was more human activity on e-commerce sites (81%) than observed on our network more broadly (71%). </p>
    <div>
      <h2>The Grinch Bot’s Modus Operandi</h2>
      <a href="#the-grinch-bots-modus-operandi">
        
      </a>
    </div>
    <p>Cloudflare saw 4x more bot requests than what we observed in 2021. Being able to observe and score all this traffic at scale means we can help customers keep the grinches away. We also got to see patterns that help us better identify the concentration of these attacks: </p><ul><li><p>19% of traffic on e-commerce sites was Grinch Bots</p></li><li><p>1% of traffic to e-commerce sites was from AI Bots. </p></li><li><p>63% of login attempt requests across our network were from bots </p></li><li><p>22% of bot activity originated from residential proxy networks</p></li></ul>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6Lo8y9dRRZGKujWxwfxBaf/4be5c93638b3dabe7acc823adac5ed6f/image3.png" />
          </figure><p>What are all of these bots up to? </p>
    <div>
      <h3><b>AI bots</b></h3>
      <a href="#ai-bots">
        
      </a>
    </div>
    <p>This year marked a breakthrough for AI-driven bots, agents, and models, with their impact spilling into Black Friday. AI bots went from zero to one, now making up 1% of all bot traffic on e-commerce sites. </p><p>AI-driven bots generated 29 billion requests on Black Friday alone, with Meta-external, Claudebot, and Amazonbot leading the pack. Based on their owners, these bots are meant to crawl to augment training data sets for Llama, Claude, and Alexa respectively. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6jzXxH2fR5KWRWc252SbNH/ad40f4a4cb8cf1fdc36558dc4324b717/image10.png" />
          </figure><p>We looked at e-commerce sites specifically to find out if these bots were treating all content equally. While Meta-External and Amazonbot were still in the Top 3 of AI bots reaching e-commerce sites, Bytedance’s Bytespider crawled the most shopping sites.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6jOsfPtliY01FSNVVgb2bZ/f444cee15d3818d24ef2e3e53731896c/image2.png" />
          </figure>
    <div>
      <h3><b>Account Takeover (ATO) bots</b></h3>
      <a href="#account-takeover-ato-bots">
        
      </a>
    </div>
    <p>In addition to <a href="https://www.cloudflare.com/learning/ai/how-to-prevent-web-scraping/">scraping</a>, crawling, and shopping, bots also targeted customer accounts on Black Friday. We saw 14.1 billion requests from bots to /login endpoints, accounting for 63% of that day’s login attempts. </p><p>While this number seems high, intuitively it makes sense, given that humans don’t log in to accounts every day, but bots definitely try to crack accounts every day. Interestingly, while humans only accounted for 36% of traffic to login pages on Black Friday, this number was <b><i>up 7-8% compared to the prior month</i></b>. This suggests that more shoppers logged in to capitalize on deals and discounts on Black Friday than in preceding weeks. Human logins peaked at around 40% of all traffic to login sites on the Monday before Thanksgiving, and again on Cyber Monday.  </p><p>Separately, we also saw a 37% increase in leaked passwords used in login requests compared to the prior month. During Birthday Week, we shared <a href="https://blog.cloudflare.com/a-safer-internet-with-cloudflare/#account-takeover-detection"><u>how 65% of internet users are at risk of ATO due to re-use of leaked passwords</u></a>. This surge, coinciding with heightened human and bot traffic, underscores a troubling pattern: both humans and bots continue to depend on common and compromised passwords, amplifying security risks.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1DDDcH41X7fhmfk2VdNaNV/de40dce017459752ba366e5fea331377/image7.png" />
          </figure><p><b>Proxy bots: </b>Regardless of whether they’re crawling your content or hoarding your wares, 22% of bot traffic originated from <a href="https://blog.cloudflare.com/residential-proxy-bot-detection-using-machine-learning/"><u>residential proxy networks</u></a>. This obfuscation makes these requests look like legitimate customers browsing from their homes rather than large cloud networks. The large pool of IP addresses and the diversity of networks poses a challenge to traditional bot defense mechanisms that rely on IP reputation and rate limiting. </p><p>Moreover, the diversity of IP addresses enables the attackers to rotate through them indefinitely. This shrinks the window of opportunity for bot detection systems to effectively detect and stop the attacks. The use of residential proxies is a trend we have been tracking for months now and Black Friday traffic was within the range we’ve seen throughout this year.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1Rk3c2sq0kLpl9Y71NdL8K/1186c343f760dda1f64adb4dd8716170/image1.png" />
          </figure><p>If you’re using Cloudflare’s <a href="https://www.cloudflare.com/en-gb/application-services/products/bot-management/"><u>Bot Management</u></a>, your site is already protected from these bots since we update our bot score based on these types of network fingerprints. In May 2024, we <a href="https://blog.cloudflare.com/residential-proxy-bot-detection-using-machine-learning/"><u>introduced</u></a> our latest model optimized for detecting residential proxies. Early results show promising declines in this type of activity, indicating that bot operators may be reducing their reliance on residential proxies. </p>
    <div>
      <h2>The Christmas “Yule” log: how customers can protect themselves</h2>
      <a href="#the-christmas-yule-log-how-customers-can-protect-themselves">
        
      </a>
    </div>
    <p>35% of all traffic on Black Friday was Grinch Bots. To keep Grinch Bots at bay, businesses need year-round bot protection and proactive strategies tailored to the unique challenges of holiday shopping.</p><p>Here are 4 yules (aka “rules”) for the season:</p><p><b>(1) Block bots</b>: 22% of bot traffic originated from residential proxy networks. Our bot management <a href="https://blog.cloudflare.com/residential-proxy-bot-detection-using-machine-learning/"><u>score automatically adjusts based on these network signals</u></a>. Use our <a href="https://developers.cloudflare.com/bots/concepts/bot-score/"><u>Bot Score</u></a> in rules to challenge sensitive actions. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3yNb5lIJ0kUuUkwrwRIFV4/21bd315b382663d5601d4629c8bec3a2/image8.png" />
          </figure><p><b>(2) Monitor potential Account Takeover (ATO) attacks</b>: Bots often test <a href="https://blog.cloudflare.com/helping-keep-customers-safe-with-leaked-password-notification/"><u>stolen credentials</u></a> in the months leading up to Cyber Week to refine their strategies. Re-use of stolen credentials makes businesses even more vulnerable. Our account abuse detections help customers monitor login paths for leaked credentials and traffic anomalies.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/20ICo3eYcY6BTdqAJE5Kj9/90b1373280aaeef54dfbed3c41e5ea8c/Screenshot_2024-12-21_at_17.52.17.png" />
          </figure><p>Check out more examples of related <a href="https://developers.cloudflare.com/waf/detections/leaked-credentials/examples/"><u>rules</u></a> you can create.</p><p><b>(3) Rate limit account and purchase paths: </b>Apply rate-limiting <a href="https://developers.cloudflare.com/waf/rate-limiting-rules/best-practices/"><u>best practices</u></a> on critical application paths. These include limiting new account access/creation from previously seen IP addresses, and leveraging other network fingerprints, to help prevent promo code abuse and inventory hoarding, as well as identifying account takeover attempts through the application of <a href="https://developers.cloudflare.com/bots/concepts/detection-ids/"><u>detection IDs</u></a> and <a href="https://developers.cloudflare.com/waf/managed-rules/check-for-exposed-credentials/"><u>leaked credential checks</u></a>.</p><p><b>(4) Block AI bots</b> abusing shopping features to maintain fair access for human users. If you’re using Cloudflare, you can quickly <a href="https://www.cloudflare.com/learning/ai/how-to-block-ai-crawlers/">block all AI bots</a> by <a href="https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/"><u>enabling our automatic AI bot blocking</u></a> feature.  </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5vfiWs6fEkloTwGblzkltR/53c9a89a323a4bb9107ea8f3e83e9b49/image11.png" />
          </figure>
    <div>
      <h2>What to expect in 2025? </h2>
      <a href="#what-to-expect-in-2025">
        
      </a>
    </div>
    <p>Over the next year, e-commerce sites should expect to see more humans shopping for longer periods. As sale periods lengthen (like they did in 2024) we expect more peaks in human activity on e-commerce sites across November and December. This is great for consumers and great for merchants.</p><p>More AI bots and agents will be integrated into e-commerce journeys in 2025. AI bots will not only be crawling sites for training data, but will also integrate into the shopping experience. AI bots did not exist in 2021, but now make up 1% of all bot traffic. This is only the tip of the iceberg and their growth will explode in the next year. We expect this to pose new risks as bots mimic and act on behalf of humans.</p><p>More sophisticated automation through network, device, and cookie cycling will also become a bigger threat. Bot operators will continue to employ advanced evasion tactics like rotating devices, IP addresses, and cookies to bypass detection.</p><p>Grinch Bots are evolving, and regulation may be slowing, but businesses don’t have to face them alone. We remain resolute in our mission to help build a better Internet ... and holiday shopping experience.</p><p>Even though the holiday season is closing out soon, bots are never on vacation. It’s never too late or too early to start protecting your customers and your business from grinches that work all year round.</p><p>Wishing you all happy holidays and a bot-free new year!</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1E192osHHmrXszsfbE2fIa/f7d04ea59ef6355eb78fed4e2fa15bd4/image6.png" />
          </figure><p></p> ]]></content:encoded>
            <category><![CDATA[AI Bots]]></category>
            <category><![CDATA[Grinch]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[Bot Management]]></category>
            <category><![CDATA[Application Security]]></category>
            <category><![CDATA[Application Services]]></category>
            <guid isPermaLink="false">5yiWFM9NumXY8HARoEP8x6</guid>
            <dc:creator>Avi Jaisinghani</dc:creator>
            <dc:creator>Adam Martinetti</dc:creator>
            <dc:creator>Brian Mitchell</dc:creator>
        </item>
        <item>
            <title><![CDATA[AI Everywhere with the WAF Rule Builder Assistant, Cloudflare Radar AI Insights, and updated AI bot protection]]></title>
            <link>https://blog.cloudflare.com/bringing-ai-to-cloudflare/</link>
            <pubDate>Fri, 27 Sep 2024 13:00:00 GMT</pubDate>
            <description><![CDATA[ This year for Cloudflare’s birthday, we’ve extended our AI Assistant capabilities to help you build new WAF rules, added new AI bot & crawler traffic insights to Radar, and given customers new AI bot  ]]></description>
            <content:encoded><![CDATA[ <p>The continued growth of AI has fundamentally changed the Internet over the past 24 months. AI is increasingly ubiquitous, and Cloudflare is leaning into the new opportunities and challenges it presents in a big way. This year for Cloudflare’s birthday, we’ve extended our AI Assistant capabilities to help you build new WAF rules, added AI bot traffic insights on Cloudflare Radar, and given customers new <a href="https://www.cloudflare.com/learning/ai/how-to-block-ai-crawlers/">AI bot blocking capabilities</a>.  </p>
    <div>
      <h2>AI Assistant for WAF Rule Builder</h2>
      <a href="#ai-assistant-for-waf-rule-builder">
        
      </a>
    </div>
    
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5RYC4wmCDbs0axY92FfkFk/a728906cb6a902dd1c78ec93a0f650c2/BLOG-2564_1.png" />
          </figure><p>At Cloudflare, we’re always listening to your feedback and striving to make our products as user-friendly and powerful as possible. One area where we've heard your feedback loud and clear is in the complexity of creating custom and rate-limiting rules for our Web Application Firewall (WAF). With this in mind, we’re excited to introduce a new feature that will make rule creation easier and more intuitive: the AI Assistant for WAF Rule Builder. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7avSjubqlfg7L8ymKEztgk/7c3c31e50879ec64bccc384bdfcd5524/BLOG-2564_2.png" />
          </figure><p>By simply entering a natural language prompt, you can generate a custom or rate-limiting rule tailored to your needs. For example, instead of manually configuring a complex rule matching criteria, you can now type something like, "Match requests with low bot score," and the assistant will generate the rule for you. It’s not about creating the perfect rule in one step, but giving you a strong foundation that you can build on. </p><p>The assistant will be available in the Custom and Rate Limit Rule Builder for all WAF users. We’re launching this feature in Beta for all customers, and we encourage you to give it a try. We’re looking forward to hearing your feedback (via the UI itself) as we continue to refine and enhance this tool to meet your needs.</p>
    <div>
      <h2>AI bot traffic insights on Cloudflare Radar</h2>
      <a href="#ai-bot-traffic-insights-on-cloudflare-radar">
        
      </a>
    </div>
    <p>AI platform providers use bots to crawl and scrape websites, vacuuming up data to use for model training. This is frequently done without the permission of, or a business relationship with, the content owners and providers. In July, Cloudflare urged content owners and providers to <a href="https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/"><u>“declare their AIndependence”</u></a>, providing them with a way to block AI bots, <a href="https://www.cloudflare.com/learning/ai/how-to-prevent-web-scraping/">scrapers</a>, and crawlers with a single click. In addition to this so-called “easy button” approach, sites can provide more specific guidance to these bots about what they are and are not allowed to access through directives in a <a href="https://www.cloudflare.com/en-gb/learning/bots/what-is-robots-txt/"><u>robots.txt</u></a> file. Regardless of whether a customer chooses to block or allow requests from AI-related bots, Cloudflare has insight into request activity from these bots, and associated traffic trends over time.</p><p>Tracking traffic trends for AI bots can help us better understand their activity over time — which are the most aggressive and have the highest volume of requests, which launch crawls on a regular basis, etc. The new <a href="https://radar.cloudflare.com/traffic#ai-bot-crawler-traffic"><b><u>AI bot &amp; crawler traffic </u></b><u>graph on Radar’s Traffic page</u></a> provides insight into these traffic trends gathered over the selected time period for the top known AI bots. The associated list of bots tracked here is based on the <a href="https://github.com/ai-robots-txt/ai.robots.txt"><u>ai.robots.txt list</u></a>, and will be updated with new bots as they are identified. <a href="https://developers.cloudflare.com/api/operations/radar-get-ai-bots-timeseries-group-by-user-agent"><u>Time series</u></a> and <a href="https://developers.cloudflare.com/api/operations/radar-get-ai-bots-summary-by-user-agent"><u>summary</u></a> data is available from the Radar API as well. (Traffic trends for the full set of AI bots &amp; crawlers <a href="https://radar.cloudflare.com/explorer?dataSet=ai.bots"><u>can be viewed in the new Data Explorer</u></a>.)</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5tYefQaBhTPYpqZPtE6KPu/f60694d0b24de2acba13fe0944589885/BLOG-2564_3.png" />
          </figure>
    <div>
      <h2>Blocking more AI bots</h2>
      <a href="#blocking-more-ai-bots">
        
      </a>
    </div>
    
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/UiFu8l6K4Pm3ulxTK3XU0/541d109e29a9ae94e4792fdf94f7e4aa/BLOG-2564_4.png" />
          </figure><p>For Cloudflare’s birthday, we’re following up on our previous blog post, <a href="https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/"><i><u>Declaring Your AIndependence</u></i></a>, with an update on the new detections we’ve added to stop AI bots. Customers who haven’t already done so can simply <a href="https://dash.cloudflare.com/?to=/:account/:zone/security/bots/configure"><u>click the button</u></a> to block AI bots to gain more protection for their website. </p>
    <div>
      <h3>Enabling dynamic updates for the AI bot rule</h3>
      <a href="#enabling-dynamic-updates-for-the-ai-bot-rule">
        
      </a>
    </div>
    <p>The old button allowed customers to block <i>verified</i> AI crawlers, those that respect robots.txt and crawl rate, and don’t try to hide their behavior. We’ve added new crawlers to that list, but we’ve also expanded the previous rule to include 27 signatures (and counting) of AI bots that <i>don’t </i>follow the rules. We want to take time to say “thank you” to everyone who took the time to use our “<a href="https://docs.google.com/forms/d/14bX0RJH_0w17_cAUiihff5b3WLKzfieDO4upRlo5wj8"><u>tip line</u></a>” to point us towards new AI bots. These tips have been extremely helpful in finding some bots that would not have been on our radar so quickly. </p><p>For each bot we’ve added, we’re also adding them to our “Definitely automated” definition as well. So, if you’re a self-service plan customer using <a href="https://blog.cloudflare.com/super-bot-fight-mode/"><u>Super Bot Fight Mode</u></a>, you’re already protected. Enterprise Bot Management customers will see more requests shift from the “Likely Bot” range to the “Definitely automated” range, which we’ll discuss more below.</p><p>Under the hood, we’ve converted this rule logic to a <a href="https://developers.cloudflare.com/waf/managed-rules/"><u>Cloudflare managed rule</u></a> (the same framework that powers our WAF). This enables our security analysts and engineers to safely push updates to the rule in real-time, similar to how new WAF rule changes are rapidly delivered to ensure our customers are protected against the latest CVEs. If you haven’t logged back into the Bots dashboard since the previous version of our AI bot protection was announced, click the button again to update to the latest protection. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2tI8Yqxt1S0UPapImb32J4/6cb9e9bf423c370383edb820e5722929/BLOG-2564_5.png" />
          </figure>
    <div>
      <h3>The impact of new fingerprints on the model </h3>
      <a href="#the-impact-of-new-fingerprints-on-the-model">
        
      </a>
    </div>
    <p>One hidden beneficiary of fingerprinting new AI bots is our ML model. <a href="https://blog.cloudflare.com/cloudflare-bot-management-machine-learning-and-more/"><u>As we’ve discussed before</u></a>, our global ML model uses supervised machine learning and greatly benefits from more sources of labeled bot data. Below, you can see how well our ML model recognized these requests as automated, before and after we updated the button, adding new rules. To keep things simple, we have shown only the top 5 bots by the volume of requests on the chart. With the introduction of our new managed rule, we have observed an improvement in our detection capabilities for the majority of these AI bots. Button v1 represents the old option that let customers block only verified AI crawlers, while Button v2 is the newly introduced feature that includes managed rule detections.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2CZVGyDCp9ZtMrZdIi49fE/aacd04d240e9348b5a9b65bad4b470e2/BLOG-2564_6.jpg" />
          </figure><p>So how did we make our detections more robust? As we have mentioned before, sometimes <a href="https://blog.cloudflare.com/cloudflare-bot-management-machine-learning-and-more/"><i><u>a single attribute can give a bot away</u></i></a>. We developed a sophisticated set of heuristics tailored to these AI bots, enabling us to effortlessly and accurately classify them as such. Although our ML model was already detecting the vast majority of these requests, the integration of additional heuristics has resulted in a noticeable increase in detection rates for each bot, and ensuring we score every request correctly 100% of the time. Transitioning from a purely machine learning approach to incorporating heuristics offers several advantages, including faster detection times and greater certainty in classification. While deploying a machine learning model is complex and time-consuming, new heuristics can be created in minutes. </p><p>The initial launch of the AI bots block button was well-received and is now used by over 133,000 websites, with significant adoption even among our Free tier customers. The newly updated button, launched on August 20, 2024, is rapidly gaining traction. Over 90,000 zones have already adopted the new rule, with approximately 240 new sites integrating it every hour. Overall, we are now helping to protect the intellectual property of more than 146,000 sites from AI bots, and we are currently blocking 66 million requests daily with this new rule. Additionally, we’re excited to announce that support for configuring AI bots protection via Terraform will be available by the end of this year, providing even more flexibility and control for managing your bot protection settings.</p>
    <div>
      <h3>Bot behavior</h3>
      <a href="#bot-behavior">
        
      </a>
    </div>
    <p>With the enhancements to our detection capabilities, it is essential to assess the impact of these changes to bot activity on the Internet. Since the launch of the updated AI bots block button, we have been closely monitoring for any shifts in bot activity and adaptation strategies. The most basic fingerprinting technique we use to identify AI bot looking for simple user-agent matches. User-agent matches are important to monitor because they indicate the bot is transparently announcing who they are when they’re crawling a website. </p><p>The graph below shows a volume of traffic we label as AI bot over the past two months. The blue line indicates the daily request count, while the red line represents the monthly average number of requests. In the past two months, we have seen an average reduction of nearly 30 million requests, with a decrease of 40 million in the most recent month.This decline coincides with the release of Button v1 and Button v2. Our hypothesis is that with the new AI bots blocking feature, Cloudflare is blocking a majority of these bots, which is discouraging them from crawling. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/23ULxmxBIRskEONlWVIvlA/1dbd3d03239047492c2d4f7307217d97/BLOG-2564_7.jpg" />
          </figure><p>This hypothesis is supported by the observed decline in requests from several top AI crawlers. Specifically, the Bytespider bot reduced its daily requests from approximately 100 million to just 50 million between the end of June and the end of August (see graph below). This reduction could be attributed to several factors, including our new AI bots block button and changes in the crawler's strategy.</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5UwtyZSXULrVzIqLcICGKd/fdf02c15d17e1d7ed248ba5f8a97eb54/BLOG-2564_8.jpg" />
          </figure><p>We have also observed an increase in the accountability of some AI crawlers. The most basic fingerprinting technique we use to identify AI bot looking for simple user-agent matches. User-agent matches are important to monitor because they indicate the bot is transparently announcing who they are when they’re crawling a website. These crawlers are now more frequently using their agents, reflecting a shift towards more transparent and responsible behavior. Notably, there has been a dramatic surge in the number of requests from the Perplexity user agent. This increase might be linked to <a href="https://rknight.me/blog/perplexity-ai-is-lying-about-its-user-agent/">previous accusations<u> </u></a>that Perplexity did not properly present its user agent, which could have prompted a shift in their approach to ensure better identification and compliance. </p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7Hq2vUMqqdNCyaxNTCg3JD/610ad53d57203203c5176229245c8086/BLOG-2564_9.jpg" />
          </figure><p>These trends suggest that our updates are likely affecting how AI crawlers interact with content. We will continue to monitor AI bot activity to help users control who accesses their content and how. By keeping a close watch on emerging patterns, we aim to provide users with the tools and insights needed to make informed decisions about managing their traffic. </p>
    <div>
      <h2>Wrap up</h2>
      <a href="#wrap-up">
        
      </a>
    </div>
    <p>We’re excited to continue to explore the AI landscape, whether we’re finding more ways to make the Cloudflare dashboard usable or new threats to guard against. Our AI insights on Radar update in near real-time, so please join us in watching as new trends emerge and discussing them in the <a href="https://community.cloudflare.com/"><u>Cloudflare Community</u></a>. </p> ]]></content:encoded>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[Bot Management]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[AI Bots]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Machine Learning]]></category>
            <category><![CDATA[Generative AI]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Developer Platform]]></category>
            <category><![CDATA[Application Services]]></category>
            <category><![CDATA[Product News]]></category>
            <guid isPermaLink="false">6HqKUMoXg0wFIQg9howLMX</guid>
            <dc:creator>Adam Martinetti</dc:creator>
            <dc:creator>Harsh Saxena</dc:creator>
            <dc:creator>Gauri Baraskar</dc:creator>
            <dc:creator>Carlos Azevedo</dc:creator>
            <dc:creator>David Belson</dc:creator>
        </item>
        <item>
            <title><![CDATA[Advancing Threat Intelligence: JA4 fingerprints and inter-request signals]]></title>
            <link>https://blog.cloudflare.com/ja4-signals/</link>
            <pubDate>Mon, 12 Aug 2024 14:00:00 GMT</pubDate>
            <description><![CDATA[ Explore how Cloudflare's JA4 fingerprinting and inter-request signals provide robust and scalable insights for advanced web security and threat detection.
 ]]></description>
            <content:encoded><![CDATA[ <p>For many years, Cloudflare has used advanced fingerprinting techniques to help block online threats, in products like our <a href="https://blog.cloudflare.com/meet-gatebot-a-bot-that-allows-us-to-sleep"><u>DDoS engine</u></a>, <a href="https://blog.cloudflare.com/patching-the-internet-fixing-the-wordpress-br/"><u>our WAF</u></a>, and <a href="https://www.cloudflare.com/application-services/products/bot-management/"><u>Bot Management</u></a>. For the purposes of Bot Management, fingerprinting characteristic elements of client software help us quickly identify what kind of software is making an HTTP request. It’s an efficient and accurate way to differentiate a browser from a Python script, while preserving user privacy. These fingerprints are used on their own for simple rules, and they underpin complex machine learning models as well. </p><p>Making sure our fingerprints keep pace with the pace of change on the Internet is a constant and critical task. Bots will always adapt to try and look more browser-like. Less frequently, browsers will introduce major changes to their behavior and affect the entire Internet landscape. Last year, Google <a href="https://chromestatus.com/feature/5124606246518784"><u>did exactly that</u></a>, making older TLS fingerprints almost useless for identifying the latest version of Chrome.</p>
    <div>
      <h2>JA3 Fingerprint </h2>
      <a href="#ja3-fingerprint">
        
      </a>
    </div>
    <p>JA3 fingerprint introduced by <a href="https://github.com/salesforce/ja3"><u>Salesforce researchers</u></a> in 2017 and later adopted by Cloudflare, involves creating a hash of the TLS ClientHello message. This hash includes the ordered list of TLS cipher suites, extensions, and other parameters, providing a unique identifier for each client. Cloudflare customers can use JA3 to build detection rules and gain insight into their network traffic.</p><p>In early 2023, Google <a href="https://chromestatus.com/feature/5124606246518784"><u>implemented a change in Chromium-based browsers</u></a> to shuffle the order of TLS extensions – a strategy aimed at disrupting the detection capabilities of JA3 and enhancing the robustness of the TLS ecosystem. This modification was prompted by concerns that fixed fingerprint patterns could lead to rigid server implementations, potentially causing complications each time Chrome updates were rolled out. Over time, JA3 became less useful due to the following reasons:</p><ul><li><p><b>Randomization of TLS extensions:</b> Browsers began randomizing the order of TLS extensions in their ClientHello messages. This change meant that the JA3 fingerprints, which relied on the sequential order of these extensions, would vary with each connection, making it unreliable for identifying unique clients​. (Further information can be found at <a href="https://www.stamus-networks.com/blog/ja3-fingerprints-fade-browsers-embrace-tls-extension-randomization"><u>Stamus Networks</u></a>.)​</p></li><li><p><b>Inconsistencies across tools</b>: Different tools and databases that implemented JA3 fingerprinting often produced varying results due to discrepancies in how they handled TLS extensions and other protocol elements. This inconsistency hindered the effectiveness of JA3 fingerprints for reliable cross-organization sharing and threat intelligence.​ (Further information can be found at <a href="https://fingerprint.com/blog/limitations-ja3-fingerprinting-accurate-device-identification/"><u>Fingerprint</u></a>.)​</p></li><li><p><b>Limited scope and lack of adaptability</b>: JA3 focused solely on elements within the TLS ClientHello packet, covering only a narrow portion of the OSI model’s layers. This limited scope often missed crucial context about a client's environment. Additionally, as newer transport layer protocols like QUIC became popular, JA3’s methodology – originally designed for older client implementations of TLS and not including modern protocols – proved ineffective.</p></li></ul>
    <div>
      <h2>Enter JA4 fingerprint</h2>
      <a href="#enter-ja4-fingerprint">
        
      </a>
    </div>
    <p>In response to these challenges, <a href="https://foxio.io/"><u>FoxIO</u></a> developed JA4, a successor to JA3 that offers a more robust, adaptable, and reliable method for fingerprinting TLS clients across various protocols, including emerging standards like QUIC. Officially launched in September 2023, JA4 is part of the broader <a href="https://blog.foxio.io/ja4%2B-network-fingerprinting"><u>JA4+ suite</u></a> that includes fingerprints for multiple protocols such as TLS, HTTP, and SSH. This suite is designed to be interpretable by both humans and machines, thereby enhancing threat detection and security analysis capabilities.</p><p>JA4 fingerprint is resistant to the randomization of TLS extensions and incorporates additional useful dimensions, such as Application Layer Protocol Negotiation (ALPN), which were not part of JA3. The introduction of JA4 has been met with positive reception in the cybersecurity community, with several open-source tools and commercial products beginning to incorporate it into their systems, including <a href="https://developers.cloudflare.com/bots/concepts/ja3-ja4-fingerprint/"><u>Cloudflare</u></a>. The JA4 fingerprint is available under the <a href="https://github.com/FoxIO-LLC/ja4/blob/main/License%20FAQ.md"><u>BSD 3-Clause license</u></a>, promoting seamless upgrades from JA3. Other fingerprints within the suite, such as JA4S (TLS Server Response) and JA4H (HTTP Client Fingerprinting), are licensed under the proprietary FoxIO License, which is designed for broader use but requires specific arrangements for commercial monetization.</p><p>Let’s take a look at specific JA4 fingerprint example, representing the latest version of Google Chrome on Linux:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7gjWV3tr6fAzSFNq9Z8Xeu/360f0079d987ebc8f8c61f4596b158be/2361-2.png" />
          </figure><ol><li><p><b>Protocol Identifier (t): </b>Indicates the use of TLS over TCP. This identifier is crucial for determining the underlying protocol, distinguishing it from <i>q</i> for QUIC or <i>d</i> for DTLS.</p></li><li><p><b>TLS Version (13): </b>Represents TLS version 1.3, confirming that the client is using one of the latest secure protocols. The version number is derived from analyzing the highest version supported in the ClientHello, excluding any <a href="https://www.rfc-editor.org/rfc/rfc8701.html"><u>GREASE</u></a> values.</p></li><li><p><b>SNI Presence (d): </b>The presence of a domain name in the <a href="https://www.cloudflare.com/en-gb/learning/ssl/what-is-sni/"><u>Server Name Indication</u></a>. This indicates that the client specifies a domain (d), rather than an IP address (i would indicate the absence of SNI).</p></li><li><p><b>Cipher Suites Count (15): </b>Reflects the total number of cipher suites included in the ClientHello, excluding any GREASE values. It provides insight into the cryptographic options the client is willing to use.</p></li><li><p><b>Extensions Count (16): </b>Indicates the count of distinct extensions presented by the client in the ClientHello. This measure helps identify the range of functionalities or customizations the client supports.</p></li><li><p><b>ALPN Values (h2): </b>Represents the Application-Layer Protocol Negotiation protocol, in this case, HTTP/2, which indicates the protocol preferences of the client for optimized web performance.</p></li><li><p><b>Cipher Hash (8daaf6152771): </b>A truncated SHA256 hash of the list of cipher suites, sorted in hexadecimal order. This unique hash serves as a compact identifier for the client’s cipher suite preferences.</p></li><li><p><b>Extension Hash (02713d6af862): </b>A truncated SHA256 hash of the sorted list of extensions combined with the list of signature algorithms. This hash provides a unique identifier that helps differentiate clients based on the extensions and signature algorithms they support.</p></li></ol><p>Here is a <a href="https://www.wireshark.org/"><u>Wireshark</u></a> example of TLS ClientHello from the latest Chrome on Linux querying <a href="https://www.cloudflare.com"><u>https://www.cloudflare.com</u></a>:</p>
          <figure>
          <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3a1jNGnnYTNZbyshIvWhtb/ead13d6dfdcef44a433bdd3f9c72952e/2361-3.png" />
          </figure><p>Integrating JA4 support into Cloudflare required rethinking our approach to parsing TLS ClientHello messages, which were previously handled in separate implementations across C, Lua, and Go. Recognizing the need to boost performance and ensure memory safety, we developed a new Rust-based crate, client-hello-parser. This unified parser not only simplifies modifications by centralizing all related logic but also prepares us for future transitions, such as replacing nginx with an upcoming Rust-based service. Additionally, this streamlined parser facilitates the exposure of JA4 fingerprints across our platform, improving the integration with Cloudflare's firewall rules, Workers, and analytics systems.</p>
    <div>
      <h2>Parsing ClientHello</h2>
      <a href="#parsing-clienthello">
        
      </a>
    </div>
    <p>client-hello-parser is an internal Rust crate designed for parsing TLS ClientHello messages. It aims to simplify the process of analyzing TLS traffic by providing a straightforward way to decode and inspect the initial handshake messages sent by clients when establishing TLS connections. This crate efficiently populates a ClientHelloParsed struct with relevant parsed fields, including version 1 and version 2 fingerprints, and JA3 and JA4 hashes, which are essential for network traffic analysis and fingerprinting.</p><p>Key benefits of the client-hello-parser library include:</p><ul><li><p><b>Optimized memory usage</b>: The library achieves amortized zero heap allocations, verified through extensive testing with the <a href="https://crates.io/crates/dhat"><u>dhat</u></a> crate to track memory allocations. Utilizing the <a href="https://crates.io/crates/tinyvec"><u>tiny_vec</u></a> crate, it begins with stack allocations for small vectors backed by fixed-size arrays, resorting to heap allocations only when these vectors exceed their initial size. This method ensures efficient reuse of all vectors, maintaining amortized zero heap allocations.</p></li><li><p><b>Memory safety:</b> Reinforced by Rust's robust borrow checker and complemented by extensive fuzzing, which has helped identify and resolve potential security vulnerabilities previously undetected in C implementations.</p></li><li><p><b>Ultra-low latency</b>: The parser benefits from using <a href="https://crates.io/crates/faster-hex"><u>faster_hex</u></a> for efficient hex encoding/decoding, which utilizes SIMD instructions to speed up processing. The use of Rust iterators also helps in optimizing performance, often allowing the compiler to generate SIMD-optimized assembly code. This efficiency is further enhanced through the use of BigEndianIterator, which allows for efficient streaming-like processing of TLS ClientHello bytes in a single pass.</p></li></ul><p>Parser benchmark results:</p>
            <pre><code>client_hello_benchmark/parse/parse-short-502
                        time:   [497.15 ns 497.23 ns 497.33 ns]
                        thrpt:  [2.0107 Melem/s 2.0111 Melem/s 2.0115 Melem/s]
client_hello_benchmark/parse/parse-long-1434
                        time:   [992.82 ns 993.55 ns 994.99 ns]
                        thrpt:  [1.0050 Melem/s 1.0065 Melem/s 1.0072 Melem/s]</code></pre>
            <p>
The benchmark results demonstrate that the parser efficiently handles different sizes of ClientHello messages, with shorter messages being processed at a rate of approximately 2 million elements per second, and longer messages at around 1 million elements per second, showcasing the effectiveness of SIMD optimizations and Rust's iterator performance in real-world applications.</p><p><b>Robust testing suite:</b> Includes dozens of real-life TLS ClientHello message examples, with parsed components verified against Wireshark with <a href="https://github.com/fullylegit/ja3"><u>JA3</u></a> and <a href="https://github.com/FoxIO-LLC/ja4/tree/main/wireshark"><u>JA4</u></a> plugins. Additionally, <a href="https://github.com/rust-fuzz/cargo-fuzz"><u>Cargo fuzzer</u></a> with memory sanitizer ensures no memory leaks or edge cases leading to core dumps. Backward compatibility tests with the legacy C parser, imported as a dependency and called via FFI, confirm that both parsers yield equivalent results.</p><p><b>Seamless integration with nginx</b>: The crate, compiled as a dynamic library, is linked to the nginx binary, ensuring a smooth transition from the legacy parser to the new Rust-based parser through backwards compatibility tests.</p><p>The transition to a new Rust-based parser has enabled the retirement of multiple implementations across different languages (C, Lua, and Go), significantly enhancing performance and parser robustness against edge cases. This shift also facilitates the easier integration of new features and business logic for parsing TLS ClientHello messages, streamlining future expansions and security updates.</p><p>With Cloudflare JA4 fingerprints implemented on our network, we were left with another problem to solve. When JA3 was released, we saw some scenarios where customers were surprised by traffic from a new JA3 fingerprint and blocked it, only to find the fingerprint was a new browser release, or an OS update had caused a change in the fingerprint used by their mobile device. By giving customers just a hash, customers still lack context. We wanted to give our customers the necessary context to help them make informed decisions about the safety of a fingerprint, so they can act quickly and confidently on it. As more of our customers embrace AI, we’ve heard more demand from our customers to break out the signals that power our bot detection. These customers want to run complex models on proprietary data that has to stay in their control, but they want to have Cloudflare’s unique perspective on Internet traffic when they do it. To us, both use cases sounded like the same problem. </p>
    <div>
      <h2>Enter JA4 Signals </h2>
      <a href="#enter-ja4-signals">
        
      </a>
    </div>
    <p>In the ever-evolving landscape of web security, traditional fingerprinting techniques like JA3 and JA4 have proven invaluable for identifying and managing web traffic. However, these methods alone are not sufficient to address the sophisticated tactics employed by malicious agents. Fingerprints can be easily spoofed, they change frequently, and traffic patterns and behaviors are constantly evolving. This is where JA4 Signals come into play, providing a robust and comprehensive approach to traffic analysis.</p><p>JA4 Signals are inter-request features computed based on the last hour of all traffic that Cloudflare sees globally. On a daily basis, we analyze over <b>15 million</b> unique JA4 fingerprints generated from more than 500 million user agents and billions of IP addresses. This breadth of data enables JA4 Signals to provide aggregated statistics that offer deeper insights into global traffic patterns – far beyond what single-request or connection fingerprinting can achieve. These signals are crucial for enhancing security measures, whether through simple firewall rules, Workers scripts, or advanced machine learning models.</p><p>Let's consider a specific example of JA4 Signals from a Firewall events activity log, which involves the latest version of Chrome:</p><p>This example highlights that a particular HTTP request received a Bot Score of 95, suggesting it likely originated from a human user operating a browser rather than an automated program or a bot. Analyzing JA4 Signals in this context provides deeper insight into the behavior of this client (latest Linux Chrome) in comparison to other network clients and their respective JA4 fingerprints. Here are a few examples of the signals our customers can see on any request:</p><table><tr><td><p><b><u>JA4 Signal</u></b></p></td><td><p><b><u>Description</u></b></p></td><td><p><b><u>Value example</u></b></p></td><td><p><b><u>Interpretation</u></b></p></td></tr><tr><td><p>browser_ratio_1h</p></td><td><p>The ratio of requests originating from browser-based user agents for the JA4 fingerprint in the last hour. Higher values suggest a higher proportion of browser-based requests.</p></td><td><p>0.942</p></td><td><p>Indicates a 94.2% browser-based request rate for this JA4.</p></td></tr><tr><td><p>cache_ratio_1h</p></td><td><p>The ratio of cacheable responses for the JA4 fingerprint in the last hour. Higher values suggest a higher proportion of responses that can be cached.</p></td><td><p>0.534</p></td><td><p>Shows a 53.4% cacheable response rate for this JA4.</p></td></tr><tr><td><p>h2h3_ratio_1h</p></td><td><p>The ratio of HTTP/2 and HTTP/3 requests combined with the total number of requests for the JA4 fingerprint in the last hour. Higher values indicate a higher proportion of HTTP/2 and HTTP/3 requests compared to other protocol versions.</p></td><td><p>0.987</p></td><td><p>Reflects a 98.7% rate of HTTP/2 and HTTP/3 requests.</p></td></tr><tr><td><p>reqs_quantile_1h</p></td><td><p>The quantile position of the JA4 fingerprint based on the number of requests across all fingerprints in the last hour. Higher values indicate a relatively higher number of requests compared to other fingerprints.</p></td><td><p>1</p></td><td><p>High volume of requests compared to other JA4s.</p></td></tr></table><p>The JA4 fingerprint and JA4 Signals are now available in the Firewall Rules UI, Bot Analytics and Workers. Customers can now use these fields to write custom rules, rate-limiting rules, transform rules, or Workers logic using JA4 fingerprint and JA4 Signals. </p><p>Let's demonstrate how to use JA4 Signals with the following Worker example. This script processes incoming requests by parsing and categorizing JA4 Signals, providing a clear structure for further analysis or rule application within Cloudflare Workers:</p>
            <pre><code>/**
 * Event listener for 'fetch' events. This triggers on every request to the worker.
 */
addEventListener('fetch', event =&gt; {
  event.respondWith(handleRequest(event.request))
})

/**
 * Main handler for incoming requests.
 * @param {Request} request - The incoming request object from the fetch event.
 * @returns {Response} A response object with JA4 Signals in JSON format.
 */
async function handleRequest(request) {
  // Safely access the ja4Signals object using optional chaining, which prevents errors if properties are undefined.
  const ja4Signals = request.cf?.botManagement?.ja4Signals || {};

  // Construct the response content, including both the original ja4Signals and the parsed signals.
  const responseContent = {
    ja4Signals: ja4Signals,
    jaSignalsParsed: parseJA4Signals(ja4Signals)
  };

  // Return a JSON response with appropriate headers.
  return new Response(JSON.stringify(responseContent), {
    status: 200,
    headers: {
      "content-type": "application/json;charset=UTF-8"
    }
  })
}

/**
 * Parses the JA4 Signals into categorized groups based on their names.
 * @param {Object} ja4Signals - The JA4 Signals object that may contain various metrics.
 * @returns {Object} An object with categorized JA4 Signals: ratios, ranks, and quantiles.
 */
function parseJA4Signals(ja4Signals) {
  // Define the keys for each category of signals.
  const ratios = ['h2h3_ratio_1h', 'heuristic_ratio_1h', 'browser_ratio_1h', 'cache_ratio_1h'];
  const ranks = ['uas_rank_1h', 'paths_rank_1h', 'reqs_rank_1h', 'ips_rank_1h'];
  const quantiles = ['reqs_quantile_1h', 'ips_quantile_1h'];

  // Return an object with each category containing only the signals that are present.
  return {
    ratios: filterKeys(ja4Signals, ratios),
    ranks: filterKeys(ja4Signals, ranks),
    quantiles: filterKeys(ja4Signals, quantiles)
  };
}

/**
 * Filters the keys in the ja4Signals object that match the list of specified keys and are not undefined.
 * @param {Object} ja4Signals - The JA4 Signals object.
 * @param {Array&lt;string&gt;} keys - An array of keys to filter from the ja4Signals object.
 * @returns {Object} A filtered object containing only the specified keys that are present in ja4Signals.
 */
function filterKeys(ja4Signals, keys) {
  const filtered = {};
  // Iterate over the specified keys and add them to the filtered object if they exist in ja4Signals.
  keys.forEach(key =&gt; {
    // Check if the key exists and is not undefined to handle optional presence of each signal.
    if (ja4Signals &amp;&amp; ja4Signals[key] !== undefined) {
      filtered[key] = ja4Signals[key];
    }
  });
  return filtered;
}</code></pre>
            
    <div>
      <h2><b>Benefits of JA4 Signals</b></h2>
      <a href="#benefits-of-ja4-signals">
        
      </a>
    </div>
    <ul><li><p><b>Comprehensive traffic analysis</b>: JA4 Signals aggregate data over an hour to provide a holistic view of traffic patterns. This method enhances the ability to identify emerging threats and abnormal behaviors by analyzing changes over time rather than in isolation.</p></li><li><p><b>Precision in anomaly detection</b>: Leveraging detailed inter-request features, JA4 Signals enable the precise detection of anomalies that may be overlooked by single-request fingerprinting. This leads to more accurate identification of sophisticated cyber threats.</p></li><li><p><b>Globally scalable insights</b>: By synthesizing data at a global scale, JA4 Signals harness the strength of Cloudflare’s network intelligence. This extensive analysis makes the system less susceptible to manipulation and provides a resilient foundation for security protocols.</p></li><li><p><b>Dynamic security enforcement</b>: JA4 Signals can dynamically inform security rules, from simple firewall configurations to complex machine learning algorithms. This adaptability ensures that security measures evolve in tandem with changing traffic patterns and emerging threats.</p></li><li><p><b>Reduction in false positives and negatives</b>: With the detailed insights provided by JA4 Signals, security systems can distinguish between legitimate and malicious traffic more effectively, reducing the occurrence of false positives and negatives and improving overall system reliability.</p></li></ul>
    <div>
      <h2>Conclusion</h2>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>The introduction of JA4 fingerprint and JA4 Signals marks a significant milestone in advancing Cloudflare’s security offerings, including Bot Management and <a href="https://www.cloudflare.com/ddos/"><u>DDoS protection</u></a>. These tools not only enhance the robustness of our traffic analysis but also showcase the continuous evolution of our network fingerprinting techniques. The efficiency of computing JA4 fingerprints enables real-time detection and response to emerging threats. Similarly, by leveraging aggregated statistics and inter-request features, JA4 Signals provide deep insights into traffic patterns at speeds measured in microseconds, ensuring that no detail is too small to be captured and analyzed.</p><p>These security features are underpinned by the scalable techniques and open-sourced libraries outlined in <a href="https://blog.cloudflare.com/scalable-machine-learning-at-cloudflare"><u>"Every request, every microsecond: scalable machine learning at Cloudflare"</u></a>. This discussion highlights how Cloudflare's innovations not only analyze vast amounts of data but also transform this analysis into actionable, reliable, and dynamically adaptable security measures.</p><p>Any Enterprise business with a bot problem will benefit from Cloudflare’s unique JA4 implementation and our perspective on bot traffic, but customers who run their own internal threat models will also benefit from access to data insights from a network that processes over 50 million requests per second. Please <a href="https://www.cloudflare.com/plans/enterprise/contact/"><u>get in touch</u></a> with us to learn more about our Bot Management offering.</p> ]]></content:encoded>
            <category><![CDATA[Bot Management]]></category>
            <category><![CDATA[Threat Intelligence]]></category>
            <category><![CDATA[WAF]]></category>
            <category><![CDATA[Application Services]]></category>
            <guid isPermaLink="false">4sRriOEqIpi6j3IvpnSB6B</guid>
            <dc:creator>Alex Bocharov</dc:creator>
            <dc:creator>Adam Martinetti</dc:creator>
        </item>
        <item>
            <title><![CDATA[Declare your AIndependence: block AI bots, scrapers and crawlers with a single click]]></title>
            <link>https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/</link>
            <pubDate>Wed, 03 Jul 2024 13:00:26 GMT</pubDate>
            <description><![CDATA[ To help preserve a safe Internet for content creators, we’ve just launched a brand new “easy button” to block all AI bots. It’s available for all customers, including those on our free tier ]]></description>
            <content:encoded><![CDATA[ <p></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/D59Fq5QkC4J7Jjo5lM4Fm/fcc55b665562d321bd84f88f53f46b22/image7-1.png" />
            
            </figure><p>To help preserve a safe Internet for content creators, we’ve just launched a brand new “easy button” to <a href="https://www.cloudflare.com/learning/ai/how-to-block-ai-crawlers/">block all AI bots</a>. It’s available for all customers, including those on our free tier.</p><p>The popularity of <a href="https://www.cloudflare.com/learning/ai/what-is-generative-ai/">generative AI</a> has made the demand for content used to train models or run inference on skyrocket, and, although some AI companies clearly identify their web scraping bots, not all AI companies are being transparent. Google reportedly <a href="https://www.reuters.com/technology/reddit-ai-content-licensing-deal-with-google-sources-say-2024-02-22/">paid $60 million a year</a> to license Reddit’s user generated content, and most recently, Perplexity has been <a href="https://rknight.me/blog/perplexity-ai-is-lying-about-its-user-agent/">accused of impersonating legitimate visitors</a> in order to scrape content from websites. The value of original content in bulk has never been higher.</p><p>Last year, <a href="/ai-bots">Cloudflare announced the ability for customers to easily block AI bots</a> that behave well. These bots follow <a href="https://www.cloudflare.com/learning/bots/what-is-robots-txt/">robots.txt</a>, and don’t use unlicensed content to train their models or run inference for <a href="https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/">RAG</a> applications using website data. Even though these AI bots follow the rules, Cloudflare customers overwhelmingly opt to <a href="https://www.cloudflare.com/learning/ai/how-to-prevent-web-scraping/">block them</a>.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5aAA77Hl9OM2vtI611QcUI/0992e096262e348b451efd8be296fa27/image9.png" />
            
            </figure><p>We hear clearly that customers don’t want AI bots visiting their websites, and especially those that do so dishonestly. To help, we’ve added a brand new one-click to block all AI bots. It’s available for all customers, including those on the free tier. To enable it, simply navigate to the <a href="https://dash.cloudflare.com/?to=/:account/:zone/security/bots/configure">Security &gt; Bots</a> section of the Cloudflare dashboard, and click the toggle labeled AI Scrapers and Crawlers.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/xD0lhy89vZb34dtIWukt1/3e0e1ed979a33d344e53d4da2a819e1e/image2.png" />
            
            </figure><p>This feature will automatically be updated over time as we see new fingerprints of offending bots we identify as widely scraping the web for model training. To ensure we have a comprehensive understanding of all AI crawler activity, we surveyed traffic across our network.</p>
    <div>
      <h3>AI bot activity today</h3>
      <a href="#ai-bot-activity-today">
        
      </a>
    </div>
    <p>The graph below illustrates the most popular AI bots seen on Cloudflare’s network in terms of their request volume. We looked at common AI crawler user agents and aggregated the number of requests on our platform from these AI user agents over the last year:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/13pNq4MJJB92Dcs1ghxC6k/b7e7acc7e65e9e0958eed5d4b4cb0594/image6.png" />
            
            </figure><p>When looking at the number of requests made to Cloudflare sites, we see that <i>Bytespider</i>, <i>Amazonbot</i>, <i>ClaudeBot</i>, and <i>GPTBot</i> are the top four AI crawlers. Operated by ByteDance, the Chinese company that owns TikTok, <i>Bytespider</i> is reportedly used to gather training data for its large language models (LLMs), including those that support its ChatGPT rival, Doubao. <i>Amazonbot</i> and <i>ClaudeBot</i> follow <i>Bytespider</i> in request volume. <i>Amazonbot</i>, reportedly used to index content for Alexa’s question-answering, sent the second-most number of requests and <i>ClaudeBot</i>, used to train the Claude chat bot, has recently increased in request volume.</p><p>Among the top AI bots that we see, <i>Bytespider</i> not only leads in terms of number of requests but also in both the extent of its Internet property crawling and the frequency with which it is blocked. Following closely is <i>GPTBot</i>, which ranks second in both crawling and being blocked. <i>GPTBot</i>, managed by OpenAI, collects training data for its LLMs, which underpin AI-driven products such as ChatGPT. In the table below, “Share of websites accessed” refers to the proportion of websites protected by Cloudflare that were accessed by the named AI bot.</p>
<table><thead>
  <tr>
    <th><span>AI Bot</span></th>
    <th><span>Share of Websites Accessed</span></th>
  </tr></thead>
<tbody>
  <tr>
    <td><span>Bytespider</span></td>
    <td><span>40.40%</span></td>
  </tr>
  <tr>
    <td><span>GPTBot</span></td>
    <td><span>35.46%</span></td>
  </tr>
  <tr>
    <td><span>ClaudeBot</span></td>
    <td><span>11.17%</span></td>
  </tr>
  <tr>
    <td><span>ImagesiftBot</span></td>
    <td><span>8.75%</span></td>
  </tr>
  <tr>
    <td><span>CCBot</span></td>
    <td><span>2.14%</span></td>
  </tr>
  <tr>
    <td><span>ChatGPT-User</span></td>
    <td><span>1.84%</span></td>
  </tr>
  <tr>
    <td><span>omgili</span></td>
    <td><span>0.10%</span></td>
  </tr>
  <tr>
    <td><span>Diffbot</span></td>
    <td><span>0.08%</span></td>
  </tr>
  <tr>
    <td><span>Claude-Web</span></td>
    <td><span>0.04%</span></td>
  </tr>
  <tr>
    <td><span>PerplexityBot</span></td>
    <td><span>0.01%</span></td>
  </tr>
</tbody></table><p>While our analysis identified the most popular crawlers in terms of request volume and number of Internet properties accessed, many customers are likely not aware of the more popular AI crawlers actively crawling their sites. Our Radar team performed an analysis of the top robots.txt entries across the <a href="https://radar.cloudflare.com/domains">top 10,000 Internet domains</a> to identify the most commonly actioned AI bots, then looked at how frequently we saw these bots on sites protected by Cloudflare.</p><p>In the graph below, which looks at disallowed crawlers for these sites, we see that customers most often reference <i>GPTBot, CCBot</i>, and <i>Google</i> in robots.txt, but do not specifically disallow popular AI crawlers like <i>Bytespider</i> and <i>ClaudeBot</i>.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6m4jV8g9sQ0BLR7OIonsoB/a4c3100a34160c96aea07c4ed4bc6a8d/image3.png" />
            
            </figure><p>With the Internet now flooded with these AI bots, we were curious to see how website operators have already responded. In June, AI bots accessed around 39% of the top one million Internet properties using Cloudflare, but only 2.98% of these properties took measures to block or challenge those requests. Moreover, the higher-ranked (more popular) an Internet property is, the more likely it is to be targeted by AI bots, and correspondingly, the more likely it is to block such requests.</p>
<table><thead>
  <tr>
    <th><span>Top N Internet properties by number of visitors seen by Cloudflare</span></th>
    <th><span>% accessed by AI bots</span></th>
    <th><span>% blocking AI bots</span></th>
  </tr></thead>
<tbody>
  <tr>
    <td><span>10</span></td>
    <td><span>80.0%</span></td>
    <td><span>40.0%</span></td>
  </tr>
  <tr>
    <td><span>100</span></td>
    <td><span>63.0%</span></td>
    <td><span>16.0%</span></td>
  </tr>
  <tr>
    <td><span>1,000</span></td>
    <td><span>53.2%</span></td>
    <td><span>8.8%</span></td>
  </tr>
  <tr>
    <td><span>10,000</span></td>
    <td><span>47.99%</span></td>
    <td><span>8.92%</span></td>
  </tr>
  <tr>
    <td><span>100,000</span></td>
    <td><span>44.53%</span></td>
    <td><span>6.36%</span></td>
  </tr>
  <tr>
    <td><span>1,000,000</span></td>
    <td><span>38.73%</span></td>
    <td><span>2.98%</span></td>
  </tr>
</tbody></table>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6gCWVJMv9GajRT3H8BQ5EM/effb5e2b52c0bdecb99f5f4e339c8d1d/image4.png" />
            
            </figure><p>We see website operators completely block access to these AI crawlers using robots.txt. However, these blocks are reliant on the bot operator respecting robots.txt and adhering to <a href="https://www.rfc-editor.org/rfc/rfc9309.html#name-the-user-agent-line">RFC9309</a> (ensuring variations on user against all match the product token) to honestly identify who they are when they visit an Internet property, but user agents are trivial for bot operators to change.</p>
    <div>
      <h3>How we find AI bots pretending to be real web browsers</h3>
      <a href="#how-we-find-ai-bots-pretending-to-be-real-web-browsers">
        
      </a>
    </div>
    <p>Sadly, we’ve observed bot operators attempt to appear as though they are a real browser by using a spoofed user agent. We’ve monitored this activity over time, and we’re proud to say that our global machine learning model has always recognized this activity as a bot, even when operators lie about their user agent.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4JpBRAGuQ1DOCTSFu9yHbH/9c11b569a30f68ddb1b4c197054ed1c8/image1.png" />
            
            </figure><p>Take one example of a specific bot that <a href="https://rknight.me/blog/perplexity-ai-is-lying-about-its-user-agent/">others</a> observed to be <a href="https://www.wired.com/story/perplexity-is-a-bullshit-machine/">hiding their activity</a>. We ran an analysis to see how our machine learning models scored traffic from this bot. In the diagram below, you can see that all <a href="https://developers.cloudflare.com/bots/concepts/bot-score/">bot scores</a> are firmly below 30, indicating that our scoring thinks this activity is likely to be coming from a bot.</p><p>The diagram reflects scoring of the requests using <a href="/residential-proxy-bot-detection-using-machine-learning">our newest model</a>, where “hotter” colors indicate more requests falling in that band, and “cooler” colors meaning fewer requests did. We can see the vast majority of requests fell into the bottom two bands, showing that Cloudflare’s model gave the offending bot a score of 9 or less. The user agent changes have no effect on the score, because this is the very first thing we expect bot operators to do.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1y0G6D2b512V1UAgR6sooD/4cc9b659f091e84facbec66a30baafad/image5.png" />
            
            </figure><p>Any customer with an existing WAF rule set to challenge visitors with a bot score below 30 (our recommendation) automatically blocked all of this AI bot traffic with no new action on their part. The same will be true for future AI bots that use similar techniques to hide their activity.</p><p>We leverage Cloudflare global signals to calculate our Bot Score, which for AI bots like the one above, reflects that we correctly identify and score them as a “likely bot.”</p><p>When bad actors attempt to crawl websites at scale, they generally use tools and frameworks that we are able to fingerprint. For every fingerprint we see, we use Cloudflare’s network, which sees over 57 million requests per second on average, to understand how much we should trust this fingerprint. To power our models, we compute global aggregates across many signals. Based on these signals, our models were able to appropriately flag traffic from evasive AI bots, like the example mentioned above, as bots.</p><p>The upshot of this globally aggregated data is that we can immediately detect new scraping tools and their behavior without needing to manually fingerprint the bot, ensuring that customers stay protected from the newest waves of bot activity.</p><p>If you have a tip on an AI bot that’s not behaving, we’d love to investigate. There are two options you can use to report misbehaving AI crawlers:</p><p>1. Enterprise Bot Management customers can submit a False Negative <a href="https://developers.cloudflare.com/bots/concepts/feedback-loop/">Feedback Loop</a> report via Bot Analytics by simply selecting the segment of traffic where they noticed misbehavior:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/iwX6mvdnqg3KGRN0dMHou/a7c0a39275680db58f49c9292ca180c7/image8.png" />
            
            </figure><p>2. We’ve also set up a <a href="https://docs.google.com/forms/d/14bX0RJH_0w17_cAUiihff5b3WLKzfieDO4upRlo5wj8/edit">reporting tool</a> where any Cloudflare customer can submit reports of an AI bot scraping your website without permission.</p><p>We fear that some AI companies intent on circumventing rules to access content will persistently adapt to evade bot detection. We will continue to keep watch and add more bot blocks to our AI Scrapers and Crawlers rule and evolve our machine learning models to help keep the Internet a place where content creators can thrive and keep full control over which models their content is used to train or run inference on.</p> ]]></content:encoded>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[Bot Management]]></category>
            <category><![CDATA[AI Bots]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Machine Learning]]></category>
            <category><![CDATA[Generative AI]]></category>
            <guid isPermaLink="false">4iUvyS3jKebfV9pHwg7pol</guid>
            <dc:creator>Alex Bocharov</dc:creator>
            <dc:creator>Santiago Vargas</dc:creator>
            <dc:creator>Adam Martinetti</dc:creator>
            <dc:creator>Reid Tatoris</dc:creator>
            <dc:creator>Carlos Azevedo</dc:creator>
        </item>
        <item>
            <title><![CDATA[Using machine learning to detect bot attacks that leverage residential proxies]]></title>
            <link>https://blog.cloudflare.com/residential-proxy-bot-detection-using-machine-learning/</link>
            <pubDate>Mon, 24 Jun 2024 13:00:17 GMT</pubDate>
            <description><![CDATA[ Cloudflare's Bot Management team has released a new Machine Learning model for bot detection (v8), focusing on bots and abuse from residential proxies ]]></description>
            <content:encoded><![CDATA[ <p></p><p>Bots using residential proxies are a major source of frustration for security engineers trying to fight online abuse. These engineers often see a similar pattern of abuse when well-funded, modern botnets target their applications. Advanced bots bypass country blocks, <a href="https://www.cloudflare.com/en-gb/learning/network-layer/what-is-an-autonomous-system/">ASN</a> blocks, and rate-limiting. Every time, the bot operator moves to a new IP address space until they blend in perfectly with the “good” traffic, mimicking real users’ behavior and request patterns. Our new Bot Management machine learning model (v8) identifies residential proxy abuse without resorting to IP blocking, which can cause false positives for legitimate users.  </p>
    <div>
      <h2>Background</h2>
      <a href="#background">
        
      </a>
    </div>
    <p>One of the main sources of Cloudflare’s <a href="https://developers.cloudflare.com/bots/concepts/bot-score/">bot score</a> is our bot detection machine learning model which analyzes, on average, over 46 million HTTP requests per second in real time. Since our first Bot Management ML model was released in 2019, we have continuously evolved and improved the model. Nowadays, our models leverage features based on request fingerprints, behavioral signals, and global statistics and trends that we see across our network.</p><p>Each iteration of the model focuses on certain areas of improvement. This process starts with a rigorous R&amp;D phase to identify the emerging patterns of <a href="https://www.cloudflare.com/learning/bots/what-is-a-bot-attack/">bot attacks</a> by reviewing <a href="https://developers.cloudflare.com/bots/concepts/feedback-loop/">feedback from our customers</a> and reports of missed attacks. In v8, we mainly focused on two areas of abuse. First, we analyzed the campaigns that leverage residential IP proxies, which are proxies on residential networks commonly used to launch widely distributed attacks against high profile targets. In addition to that, we improved model accuracy for detecting attacks that originate from cloud providers.</p>
    <div>
      <h3>Residential IP proxies</h3>
      <a href="#residential-ip-proxies">
        
      </a>
    </div>
    <p>Proxies allow attackers to hide their identity and distribute their attack. Moreover, IP address rotation allows attackers to directly bypass traditional defenses such as IP reputation and IP rate limiting. Knowing this, defenders use a plethora of signals to identify malicious use of proxies. In its simplest forms, IP reputation signals (e.g., data center IP addresses, known open proxies, etc.) can lead to the detection of such distributed attacks.</p><p>However, in the past few years, bot operators have started favoring proxies operating in residential network IP address space. By using residential IP proxies, attackers can masquerade as legitimate users by sending their traffic through residential networks. Nowadays, residential IP proxies are offered by companies that facilitate access to large pools of IP addresses for attackers. Residential proxy providers claim to offer 30-100 million IPs belonging to residential and mobile networks across the world. Most commonly, these IPs are sourced by partnering with free VPN providers, as well as including the proxy SDKs into popular browser extensions and mobile applications. This allows residential proxy providers to gain a foothold on victims’ devices and abuse their residential network connections.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/Oob9FiycD6yIfYf8Xtdny/06e74f2dd0032ea4610dcab55b0ec38a/residential-proxy-architecture.jpg" />
            
            </figure><p>Figure 1: Architecture of a residential proxy network</p><p>Figure 1 depicts the architecture of a residential proxy. By subscribing to these services, attackers gain access to an authenticated proxy gateway address commonly using the HTTPS/<a href="https://datatracker.ietf.org/doc/html/rfc1928">SOCKS5</a> proxy protocol. Some residential proxy providers allow their users to select the country or region for the proxy exit nodes. Alternatively, users can choose to keep the same IP address throughout their session or rotate to a new one for each outgoing request. Residential proxy providers then identify active exit nodes on their network (on devices that they control within residential networks across the world) and route the proxied traffic through them.</p><p>The large pool of IP addresses and the diversity of networks poses a challenge to traditional bot defense mechanisms that rely on IP reputation and rate limiting. Moreover, the diversity of IPs enables the attackers to rotate through them indefinitely. This shrinks the window of opportunity for bot detection systems to effectively detect and stop the attacks. Effective defense against residential proxy attacks should be able to detect this type of bot traffic either based on single request features to stop the attack immediately, or identify unique fingerprints from the browsing agent to track and mitigate the bot traffic regardless of the IP source. Overly broad blocking actions, such as IP block-listing, by definition, would result in blocking legitimate traffic from residential networks where at least one device is acting as a residential proxy node.</p>
    <div>
      <h3>ML model training</h3>
      <a href="#ml-model-training">
        
      </a>
    </div>
    <p>At its heart, our model is built using a chain of modules that work together. Initially, we fetch and prepare training and validation datasets from our Clickhouse data storage. We use datasets with high confidence labels as part of our training. For model validation, we use datasets consisting of missed attacks reported by our customers, known sources of bot traffic (e.g., <a href="https://developers.cloudflare.com/bots/reference/verified-bots-policy/">verified bots</a>), and high confidence detections from other bot management modules (e.g., heuristics engine). We orchestrate these steps using Apache Airflow, which enables us to customize each stage of the ML model training and define the interdependencies of our training, validation, and reporting modules in the form of directed acyclic graphs (DAGs).</p><p>The first step of training a new model is fetching labeled training data from our data store. Under the hood, our dataset definitions are SQL queries that will materialize by fetching data from our Clickhouse cluster where we store feature values and calculate aggregates from the traffic on our network. Figure 2 depicts these steps as train and validation dataset fetch operations. Introducing new datasets can be as straightforward as writing the SQL queries to filter the desired subset of requests.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3xmgPc689UWAHZi29CfIrp/c0d5dca3b3a497423b00c9456ea5dc7c/airflow-dag.jpg" />
            
            </figure><p>Figure 2: Airflow DAG for model training and validation</p><p>After fetching the datasets, we train our <a href="https://catboost.ai/">Catboost model</a> and tune its <a href="https://catboost.ai/en/docs/references/training-parameters/">hyper parameters</a>. During evaluation, we compare the performance of the newly trained model against the current default version running for our customers. To capture the intricate patterns in subsets of our data, we split certain validation datasets into smaller slivers called specializations. For instance, we use the detections made by our heuristics engine and managed rulesets as ground truth for bot traffic. To ensure that larger sources of traffic (large <a href="https://www.cloudflare.com/learning/network-layer/what-is-an-autonomous-system/">ASNs</a>, different HTTP versions, etc.) do not mask our visibility into patterns for the rest of the traffic, we define specializations for these sources of traffic. As a result, improvements in accuracy of the new model can be evaluated for common patterns (e.g., HTTP/1.1 and HTTP/2) as well as less common ones. Our model training DAG will provide a breakdown report for the accuracy, score distribution, feature importance, and <a href="https://shap.readthedocs.io/en/latest/generated/shap.Explainer.html">SHAP explainers</a> for each validation dataset and its specializations.</p><p>Once we are happy with the validation results and model accuracy, we evaluate our model against a checklist of steps to ensure the correctness and validity of our model. We start by ensuring that our results and observations are reproducible over multiple non-overlapping training and validation time ranges. Moreover, we check for the following factors:</p><ul><li><p>Check for the distribution of feature values to identify irregularities such as missing or skewed values.</p></li><li><p>Check for overlaps between training and validation datasets and feature values.</p></li><li><p>Verify the diversity of training data and the balance between labels and datasets.</p></li><li><p>Evaluate performance changes in the accuracy of the model on validation datasets based on their order of importance.</p></li><li><p>Check for model overfitting by evaluating the feature importance and SHAP explainers.</p></li></ul><p>After the model passes the readiness checks, we deploy it in shadow mode. We can observe the behavior of the model on live traffic in log-only mode (i.e., without affecting the <a href="https://developers.cloudflare.com/bots/concepts/bot-score/">bot score</a>). After gaining confidence in the model's performance on live traffic, we start onboarding beta customers, and gradually switch the model to active mode all while closely <a href="/monitoring-machine-learning-models-for-bot-detection">monitoring the real-world performance of our new model</a>.</p>
    <div>
      <h3>ML features for bot detection</h3>
      <a href="#ml-features-for-bot-detection">
        
      </a>
    </div>
    <p>Each of our models uses a set of features to make inferences about the incoming requests. We compute our features based on single request properties (single request features) and patterns from multiple requests (i.e., inter-request features). We can categorize these features into the following groups:</p><ul><li><p><b>Global features:</b> inter-request features that are computed based on global aggregates for different types of fingerprints and traffic sources (e.g., for an ASN) seen across our global network. Given the relatively lower cardinality of these features, we can scalably calculate global aggregates for each of them.</p></li><li><p><b>High cardinality features:</b> inter-request features focused on fine-grained aggregate data from local traffic patterns and behaviors (e.g., for an individual IP address)</p></li><li><p><b>Single request features:</b> features derived from each individual request (e.g., user agent).</p></li></ul><p>Our Bot Management system (named <a href="/scalable-machine-learning-at-cloudflare">BLISS</a>) is responsible for fetching and computing these feature values and making them available on our servers for inference by active versions of our ML models.</p>
    <div>
      <h2>Detecting residential proxies using network and behavioral signals</h2>
      <a href="#detecting-residential-proxies-using-network-and-behavioral-signals">
        
      </a>
    </div>
    <p>Attacks originating from residential IP addresses are commonly characterized by a spike in the overall traffic towards sensitive endpoints on the target websites from a large number of residential ASNs. Our approach for detecting residential IP proxies is twofold. First, we start by comparing direct vs proxied requests and looking for network level discrepancies. Revisiting Figure 1, we notice that a request routed through residential proxies (red dotted line) has to traverse through multiple hops before reaching the target, which affects the network latency of the request.</p><p>Based on this observation alone, we are able to characterize residential proxy traffic with a high true positive rate (i.e., all residential proxy requests have high network latency). While we were able to replicate this in our lab environment, we quickly realized that at the scale of the Internet, we run into numerous exceptions with false positive detections (i.e., non-residential proxy traffic with high latency). For instance, countries and regions that predominantly use satellite Internet would exhibit a high network latency for the majority of their requests due to the use of <a href="https://datatracker.ietf.org/doc/html/rfc3135">performance enhancing proxies</a>.</p><p>Realizing that relying solely on network characteristics of connections to detect residential proxies is inadequate given the diversity of the connections on the Internet, we switched our focus to the behavior of residential IPs. To that end, we observe that the IP addresses from residential proxies express a distinct behavior during periods of peak activity. While this observation singles out highly active IPs over their peak activity time, given the pool size of residential IPs, it is not uncommon to only observe a small number of requests from the majority of residential proxy IPs.</p><p>These periods of inactivity can be attributed to the temporary nature of residential proxy exit nodes. For instance, when the client software (i.e., browser or mobile application) that runs the exit nodes of these proxies is closed, the node leaves the residential proxy network. One way to filter out periods of inactivity is to increase the monitoring time and punish each IP address that exhibits residential proxy behavior for a period of time. This block-listing approach, however, has certain limitations. Most importantly, by relying only on IP-based behavioral signals, we would block traffic from legitimate users that may unknowingly run mobile applications or browser extensions that turn their devices into proxies. This is further detrimental for mobile networks where many users share their IPs behind <a href="https://en.wikipedia.org/wiki/Carrier-grade_NAT">CGNATs</a>. Figure 3 demonstrates this by comparing the share of direct vs proxied requests that we received from active residential proxy IPs over a 24-hour period. Overall, we see that 4 out of 5 requests from these networks belong to direct and benign connections from residential devices.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5vdkbz94Am2EHlL7tu9t3t/3256ae94bfee66c60ad71c6a6dbc7fc0/mlv8-blog-proxy-vs-direct.jpg" />
            
            </figure><p>Figure 3: Percentage of direct vs proxied requests from residential proxy IPs.</p><p>Using this insight, we combined behavioral and latency-based features along with new datasets to train a new <a href="https://www.cloudflare.com/learning/ai/what-is-machine-learning/">machine learning model</a> that detects residential proxy traffic on a per-request basis. This scheme allows us to block residential proxy traffic while allowing benign residential users to visit Cloudflare-protected websites from the same residential network.</p>
    <div>
      <h2>Detection results and case studies</h2>
      <a href="#detection-results-and-case-studies">
        
      </a>
    </div>
    <p>We started testing v8 in shadow mode in March 2024. Every hour, v8 is classifying more than 17 million unique IPs that participate in residential proxy attacks. Figure 4 shows the geographic distribution of IPs with residential proxy activity belonging to more than 45 thousand ASNs in 237 countries/regions. Among the most commonly requested endpoints from residential proxies, we observe patterns of account takeover attempts, such as requests to /login, /auth/login, and /api/login.  </p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3gGGhqngSzEkThZUoV3Fiw/d807f5456e3277e4aa24972588c59cf7/mlv8-blog-map-1.jpg" />
            
            </figure><p>Figure 4: Countries and regions with residential network activity. Size of markers are proportionate to the number of IPs with residential proxy activity.</p><p>Furthermore, we see significant improvements when evaluating our new machine learning model on previously missed attacks reported by our customers. In one case, v8 was able to correctly classify 95% of requests from distributed residential proxy attacks targeting the voucher redemption endpoint of the customer’s website. In another case, our new model successfully detected a previously missed <a href="https://www.cloudflare.com/learning/bots/what-is-content-scraping/">content scraping attack</a> evident by increased detection during traffic spikes depicted in Figure 5. We are continuing to monitor the behavior of residential proxy attacks in the wild and work with our customers to ensure that we can provide robust detection against these distributed attacks.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1jstvGe6HudxtAS54UJPW6/9bde592cb0efc64fb7a86ba33bb25047/shadowmode-v8.jpg" />
            
            </figure><p>Figure 5: Spikes in bot requests from residential proxies detected by ML v8</p>
    <div>
      <h2>Improving detection for bots from cloud providers</h2>
      <a href="#improving-detection-for-bots-from-cloud-providers">
        
      </a>
    </div>
    <p>In addition to residential IP proxies, bot operators commonly use cloud providers to host and run bot scripts that attack our customers. To combat these attacks, we improved our ground truth labels for cloud provider attacks in our latest ML training datasets. Early results show that v8 detects 20% more bots from cloud providers, with up to 70% more bots detected on zones that are marked as <a href="https://developers.cloudflare.com/fundamentals/reference/under-attack-mode/">under attack</a>. We further plan to expand the list of cloud providers that v8 detects as part of our ongoing updates.</p>
    <div>
      <h2>Check out ML v8</h2>
      <a href="#check-out-ml-v8">
        
      </a>
    </div>
    <p>For existing Bot Management customers we recommend <a href="https://developers.cloudflare.com/bots/reference/machine-learning-models/#enable-auto-updates-to-the-machine-learning-models">toggling “Auto-update machine learning model”</a> to instantly gain the benefits of ML v8 and its residential proxy detection, and to stay up to date with our future ML model updates. If you’re not a Cloudflare Bot Management customer, <a href="https://www.cloudflare.com/application-services/products/bot-management/">contact our sales team</a> to try out <a href="https://www.cloudflare.com/application-services/products/bot-management/">Bot Management</a>.</p> ]]></content:encoded>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[Machine Learning]]></category>
            <category><![CDATA[AI]]></category>
            <category><![CDATA[Proxying]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[Bot Management]]></category>
            <category><![CDATA[Application Services]]></category>
            <guid isPermaLink="false">2EZrHNgKqLkaTGqoRM9pMS</guid>
            <dc:creator>Bob AminAzad</dc:creator>
            <dc:creator>Santiago Vargas</dc:creator>
            <dc:creator>Adam Martinetti</dc:creator>
        </item>
        <item>
            <title><![CDATA[Integrating Turnstile with the Cloudflare WAF to challenge fetch requests]]></title>
            <link>https://blog.cloudflare.com/integrating-turnstile-with-the-cloudflare-waf-to-challenge-fetch-requests/</link>
            <pubDate>Mon, 18 Dec 2023 14:00:17 GMT</pubDate>
            <description><![CDATA[ By editing or creating a new Turnstile widget with “Pre-Clearance” enabled, Cloudflare customers can now use Turnstile to issue a challenge when a page’s HTML loads, and enforce that all valid responses have a valid Turnstile token ]]></description>
            <content:encoded><![CDATA[ <p></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3UV6CnIMI92jBmCr4VeqCU/98b0de9d9ca221f3d60bc7d02213264c/image8.png" />
            
            </figure><p>Two months ago, we made Cloudflare Turnstile <a href="/turnstile-ga/">generally available</a> — giving website owners everywhere an easy way to fend off bots, without ever issuing a CAPTCHA. Turnstile allows any website owner to embed a frustration-free Cloudflare challenge on their website with a simple code snippet, making it easy to help ensure that only human traffic makes it through. In addition to protecting a website’s frontend, Turnstile also empowers web administrators to harden browser-initiated (AJAX) API calls running under the hood. These APIs are commonly used by dynamic single-page web apps, like those created with React, Angular, Vue.js.</p><p>Today, we’re excited to announce that we have integrated Turnstile with the <a href="https://www.cloudflare.com/application-services/products/waf/">Cloudflare Web Application Firewall (WAF)</a>. This means that web admins can add the Turnstile code snippet to their websites, and then configure the Cloudflare WAF to manage these requests. This is completely customizable using WAF Rules; for instance, you can allow a user authenticated by Turnstile to interact with all of an application’s API endpoints without facing any further challenges, or you can configure certain sensitive endpoints, like Login, to always issue a challenge.</p>
    <div>
      <h3>Challenging fetch requests in the Cloudflare WAF</h3>
      <a href="#challenging-fetch-requests-in-the-cloudflare-waf">
        
      </a>
    </div>
    <p>Millions of websites protected by Cloudflare’s WAF leverage our JS Challenge, Managed Challenge, and Interactive Challenge to stop bots while letting humans through. For each of these challenges, Cloudflare intercepts the matching request and responds with an HTML page rendered by the browser, where the user completes a basic task to demonstrate that they’re human. When a user successfully completes a challenge, they receive a <a href="https://developers.cloudflare.com/fundamentals/reference/policies-compliances/cloudflare-cookies/#additional-cookies-used-by-the-challenge-platform">cf_clearance cookie</a>, which tells Cloudflare that a user has successfully passed a challenge, the type of challenge, and when it was completed. A clearance cookie can’t be shared between users, and is only valid for the time set by the Cloudflare customer in their Security Settings dashboard.</p><p>This process works well, except when a browser receives a challenge on a fetch request and the browser has not previously passed a challenge. On a fetch request, or an XML HTTP Request (XHR), the browser expects to get back simple text (in JSON or XML formats) and cannot render the HTML necessary to run a challenge.</p><p>As an example, let’s imagine a pizzeria owner who built an online ordering form in React with a payment page that submits data to an API endpoint that processes payments. When a user views the web form to add their credit card details they can pass a Managed Challenge, but when the user submits their credit card details by making a fetch request, the browser won’t execute the code necessary for a challenge to run. The pizzeria owner’s only option for handling suspicious (but potentially legitimate) requests is to block them, which runs the risk of false positives that could cause the restaurant to lose a sale.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7fOg2KPmEgB5nyeywCc0X0/8ddf84d382f902ad633fb30a3f8226a2/Group-3955.png" />
            
            </figure><p>This is where Turnstile can help. Turnstile allows anyone on the Internet to embed a Cloudflare challenge anywhere on their website. Before today, the output of Turnstile was only a one-time use token. To enable customers to issue challenges for these fetch requests, Turnstile can now issue a clearance cookie for the domain that it's embedded on. Customers can issue their challenge within the HTML page before a fetch request, <i>pre-clearing</i> the visitor to interact with the Payment API.</p>
    <div>
      <h3>Turnstile Pre-Clearance mode</h3>
      <a href="#turnstile-pre-clearance-mode">
        
      </a>
    </div>
    <p>Returning to our pizzeria example, the three big advantages of using Pre-Clearance to integrate Turnstile with the Cloudflare WAF are:</p><ol><li><p><b>Improved user experience</b>: Turnstile’s embedded challenge can run in the background while the visitor is entering their payment details.</p></li><li><p><b>Blocking more requests at the edge</b>: Because Turnstile now issues a clearance cookie for the domain that it’s embedded on, our pizzeria owner can use a Custom Rule to issue a Managed Challenge for every request to the payment API. This ensures that automated attacks attempting to target the payment API directly are stopped by Cloudflare before they can reach the API.</p></li><li><p><b>(Optional) Securing the action and the user</b>: No backend code changes are necessary to get the benefit of Pre-Clearance. However, further Turnstile integration will increase security for the integrated API. The pizzeria owner can adjust their payment form to <a href="https://developers.cloudflare.com/turnstile/get-started/server-side-validation/">validate the received Turnstile token</a>, ensuring that every payment attempt is individually validated by Turnstile to protect their payment endpoint from session hijacking.</p></li></ol>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1Er3Qa9TvxheeCOxbYeCQh/50afffa59cef839aba3a256484ea6ea5/Pre-clearance.png" />
            
            </figure><p>A Turnstile widget with Pre-Clearance enabled will still issue turnstile tokens, which gives customers the flexibility to decide if an endpoint is critical enough to require a security check on every request to it, or just once a session. Clearance cookies issued by a Turnstile widget are automatically applied to the Cloudflare zone the Turnstile widget is embedded on, with no configuration necessary. The clearance time the token is valid for is still controlled by the zone specific “Challenge Passage” time.</p>
    <div>
      <h3>Implementing Turnstile with Pre-Clearance</h3>
      <a href="#implementing-turnstile-with-pre-clearance">
        
      </a>
    </div>
    <p>Let’s make this concrete by walking through a basic implementation. Before we start, we’ve set up a simple demo application where we emulate a frontend talking to a backend on a <code>/your-api</code> endpoint.</p><p>To this end, we have the following code:</p>
            <pre><code>&lt;!DOCTYPE html&gt;
&lt;html lang="en"&gt;
&lt;head&gt;
   &lt;title&gt;Turnstile Pre-Clearance Demo &lt;/title&gt;
&lt;/head&gt;
&lt;body&gt;
  &lt;main class="pre-clearance-demo"&gt;
    &lt;h2&gt;Pre-clearance Demo&lt;/h2&gt;
    &lt;button id="fetchBtn"&gt;Fetch Data&lt;/button&gt;
    &lt;div id="response"&gt;&lt;/div&gt;
&lt;/main&gt;

&lt;script&gt;
  const button = document.getElementById('fetchBtn');
  const responseDiv = document.getElementById('response');
  button.addEventListener('click', async () =&gt; {
  try {
    let result = await fetch('/your-api');
    if (result.ok) {
      let data = await result.json();
      responseDiv.textContent = JSON.stringify(data);
    } else {
      responseDiv.textContent = 'Error fetching data';
    }
  } catch (error) {
    responseDiv.textContent = 'Network error';
  }
});
&lt;/script&gt;</code></pre>
            <p>We've created a button. Upon clicking, Cloudflare makes a <code>fetch()</code> request to the <code>/your-api</code> endpoint, showing the result in the response container.</p><p>Now let’s consider that we have a Cloudflare WAF rule set up that protects the <code>/your-api</code> endpoint with a Managed Challenge.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1sjpmlJe4atSe3ztUjbL2M/99335880b870554a9c1dd3e5c8d70614/pasted-image-0-3.png" />
            
            </figure><p>Due to this rule, the app that we just wrote is going to fail for the reason described earlier (the browser is expecting a JSON response, but instead receives the challenge page as HTML).</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4HJrHiNoxjmSdRwEcZrYuA/a62895eaa382e160eb17fce51acde32c/Screenshot-2023-12-18-at-12.00.16.png" />
            
            </figure><p>If we inspect the Network Tab, we can see that the request to <code>/your-api</code> has been given a 403 response.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2DiC7Lus2CVxUJKw5pr7mi/ab45a3af70f411998ebb4892977a255d/image10.png" />
            
            </figure><p>Upon inspection, the Cf-Mitigated header shows that the response was challenged by Cloudflare’s firewall, as the visitor has not solved a challenge before.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2D81qpEEa60G1W1pZMUr2U/f04a2571ed6f52a16f6bf28adaee9ee4/image6.png" />
            
            </figure><p>To address this problem in our app, we set up a Turnstile Widget in Pre-Clearance mode for the Turnstile sitekey that we want to use.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6xLOm4TvNFK32gEf45I1XF/7e04c0a1fdc746a64dc8cf1e08ec3bf1/image2-4.png" />
            
            </figure><p>In our application, we override the <code>fetch()</code> function to invoke Turnstile once a Cf-Mitigated response has been received.</p>
            <pre><code>&lt;script&gt;
turnstileLoad = function () {
  // Save a reference to the original fetch function
  const originalFetch = window.fetch;

  // A simple modal to contain Cloudflare Turnstile
  const overlay = document.createElement('div');
  overlay.style.position = 'fixed';
  overlay.style.top = '0';
  overlay.style.left = '0';
  overlay.style.right = '0';
  overlay.style.bottom = '0';
  overlay.style.backgroundColor = 'rgba(0, 0, 0, 0.7)';
  overlay.style.border = '1px solid grey';
  overlay.style.zIndex = '10000';
  overlay.style.display = 'none';
  overlay.innerHTML =       '&lt;p style="color: white; text-align: center; margin-top: 50vh;"&gt;One more step before you proceed...&lt;/p&gt;&lt;div style=”display: flex; flex-wrap: nowrap; align-items: center; justify-content: center;” id="turnstile_widget"&gt;&lt;/div&gt;';
  document.body.appendChild(overlay);

  // Override the native fetch function
  window.fetch = async function (...args) {
      let response = await originalFetch(...args);

      // If the original request was challenged...
      if (response.headers.has('cf-mitigated') &amp;&amp; response.headers.get('cf-mitigated') === 'challenge') {
          // The request has been challenged...
          overlay.style.display = 'block';

          await new Promise((resolve, reject) =&gt; {
              turnstile.render('#turnstile_widget', {
                  'sitekey': ‘YOUR_TURNSTILE_SITEKEY',
                  'error-callback': function (e) {
                      overlay.style.display = 'none';
                      reject(e);
                  },
                  'callback': function (token, preClearanceObtained) {
                      if (preClearanceObtained) {
                          // The visitor successfully solved the challenge on the page. 
                          overlay.style.display = 'none';
                          resolve();
                      } else {
                          reject(new Error('Unable to obtain pre-clearance'));
                      }
                  },
              });
          });

          // Replay the original fetch request, this time it will have the cf_clearance Cookie
          response = await originalFetch(...args);
      }
      return response;
  };
};
&lt;/script&gt;
&lt;script src="https://challenges.cloudflare.com/turnstile/v0/api.js?onload=turnstileLoad" async defer&gt;&lt;/script&gt;</code></pre>
            <p>There is a lot going on in the snippet above: First, we create a hidden overlay element and override the browser’s <code>fetch()</code> function. The <code>fetch()</code> function is changed to introspect the Cf-Mitigated header for ‘challenge’. If a challenge is issued, the initial result will be unsuccessful; instead, a Turnstile overlay (with Pre-Clearance enabled) will appear in our web application. Once the Turnstile challenge has been completed we will retry the previous request after Turnstile has obtained the cf_clearance cookie to get through the Cloudflare WAF.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1HNSoEaOmTMmQFuc8kKY2p/1877b884856e092cfc51637f3f050c2c/image1-2.png" />
            
            </figure><p>Upon solving the Turnstile widget, the overlay disappears, and the requested API result is shown successfully:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7aPtaSfD7JdV0jYb1iDX50/dd9807c4807f6234dcb453471f43db99/Screenshot-2023-12-18-at-12.02.56.png" />
            
            </figure>
    <div>
      <h3>Pre-Clearance is available to all Cloudflare customers</h3>
      <a href="#pre-clearance-is-available-to-all-cloudflare-customers">
        
      </a>
    </div>
    <p>Every Cloudflare user with a <a href="https://www.cloudflare.com/plans/free/">free plan</a> or above can use Turnstile in managed mode free for an unlimited number of requests. If you’re a Cloudflare user looking to improve your security and user experience for your critical API endpoints, head over to our dashboard and <a href="https://dash.cloudflare.com/?to=/:account/turnstile">create a Turnstile widget with Pre-Clearance</a> today.</p> ]]></content:encoded>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[CAPTCHA]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[Turnstile]]></category>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Micro-frontends]]></category>
            <guid isPermaLink="false">1aYnXBUBD1B2KvKgz0veFW</guid>
            <dc:creator>Adam Martinetti</dc:creator>
            <dc:creator>Benedikt Wolters</dc:creator>
            <dc:creator>Miguel de Moura</dc:creator>
        </item>
        <item>
            <title><![CDATA[Cloudflare is free of CAPTCHAs; Turnstile is free for everyone]]></title>
            <link>https://blog.cloudflare.com/turnstile-ga/</link>
            <pubDate>Fri, 29 Sep 2023 13:00:00 GMT</pubDate>
            <description><![CDATA[ Now that we’ve eliminated CAPTCHAs at Cloudflare, we want to hasten the demise of CAPTCHAs across the internet. We’re thrilled to announce that Turnstile is generally available, and Turnstile’s ‘Managed’ mode is now completely free to everyone for unlimited use.  ]]></description>
            <content:encoded><![CDATA[ <p></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2562yydO3PNFG88W5iTE0P/ee8cda8c9929f566e738c0e0f75b2a9b/image3-37.png" />
            
            </figure><p>For years, we’ve <a href="/moving-from-recaptcha-to-hcaptcha/">written</a> that CAPTCHAs drive us crazy. Humans give up on CAPTCHA puzzles <a href="https://www.math.unipd.it/~gaggi/doc/ads20.pdf">approximately 15% of the time</a> and, maddeningly, <a href="https://www.usenix.org/conference/usenixsecurity23/presentation/searles">CAPTCHAs are significantly easier for bots</a> to solve than they are for humans. We’ve spent the past three and a half years working to build a better experience for humans that’s just as effective at stopping bots. As of this month, we’ve finished replacing every CAPTCHA issued by Cloudflare with Turnstile, our new <a href="https://www.cloudflare.com/products/turnstile/">CAPTCHA replacement</a> (pictured below). Cloudflare will never issue another visual puzzle to anyone, for any reason.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/10LzRAr38KzxAANQIVxwZT/0fe5ec0867c70f8217a6deff4b244f9b/image2.gif" />
            
            </figure><p>Now that we’ve eliminated CAPTCHAs at Cloudflare, we want to make it easy for anyone to do the same, even if they don’t use other Cloudflare services. We’ve decoupled Turnstile from our platform so that any website operator on any platform can use it just by adding <a href="https://github.com/cloudflare/turnstile-demo-workers/blob/main/src/explicit.html#L74-L85">a few lines of code</a>. We’re thrilled to announce that Turnstile is now generally available, and <b>Turnstile’s ‘Managed’ mode is now completely free to everyone for unlimited use</b>.</p>
    <div>
      <h3>Easy on humans, hard on bots, private for everyone</h3>
      <a href="#easy-on-humans-hard-on-bots-private-for-everyone">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6DQmrvGrrHUPlLMHrknjyY/99ea339af6278970204cb33bcdf5520f/image6-5.png" />
            
            </figure><p>There’s a lot that goes into Turnstile’s simple checkbox to ensure that it’s easy for everyone, preserves user privacy, and does its job stopping <a href="https://www.cloudflare.com/learning/bots/what-is-a-bot/">bots</a>. Part of making challenges better for everyone means that everyone gets the same great experience, no matter what browser you’re using. Because we do not employ a visual puzzle, users with low vision or blindness get the same easy to use challenge flow as everyone else.</p><p>It was particularly important for us to avoid falling back to audio CAPTCHAs to offer an experience accessible to everyone. Audio CAPTCHAs are often much worse than even visual CAPTCHAs for humans to solve, with only <a href="https://web.stanford.edu/~jurafsky/burszstein_2010_captcha.pdf">31.2% of audio challenges</a> resulting in a three-person agreement on what the correct solution actually is. The prevalence of free speech-to-text services has made it easy for bots to solve audio CAPTCHAs as well, with <a href="https://uncaptcha.cs.umd.edu/papers/uncaptcha_woot17.pdf">a recent study</a> showing bots can accurately solve audio CAPTCHAs in over 85% of attempts. We’re proud to state that Turnstile is WCAG 2.1 Level AA compliant, while eliminating the need for audio CAPTCHAs as well as visual ones.</p><p>We also created Turnstile to be privacy focused. Turnstile meets <a href="https://www.cloudflare.com/learning/privacy/what-is-eprivacy-directive/">ePrivacy Directive</a>, <a href="https://www.cloudflare.com/learning/privacy/what-is-the-gdpr/">GDPR</a> and <a href="https://www.cloudflare.com/learning/privacy/what-is-the-ccpa/">CCPA</a> compliance requirements, as well as the strict requirements of our own privacy commitments. In addition, Cloudflare's <a href="https://marketplace.fedramp.gov/products/FR2000863987">FedRAMP Moderate authorized package</a>, "Cloudflare for Government" now includes Turnstile. We don’t rely on tracking user data, like what other websites someone has visited, to determine if a user is a human or robot. Our business is protecting websites, not selling ads, so operators can deploy Turnstile knowing that their users’ data is safe.</p><p>With all of our emphasis on how <i>easy</i> it is to pass a Turnstile challenge, you would be right to ask how it can stop a bot. If a bot can find <a href="https://www.vox.com/22436832/captchas-getting-harder-ai-artificial-intelligence">all images with crosswalks</a> in grainy photos faster than we can, surely it can check a box as well. Bots definitely can check a box, and they can even <a href="https://arxiv.org/abs/1903.01003">mimic the erratic path of human mouse movement</a> while doing so. For Turnstile, the actual act of checking a box isn’t important, it’s the background data we’re analyzing while the box is checked that matters. We find and stop bots by running a series of in-browser tests, checking browser characteristics, native browser APIs, and asking the browser to pass lightweight tests (ex: proof-of-work tests, proof-of-space tests) to prove that it’s an actual browser. The current deployment of Turnstile checks billions of visitors every day, and we are able to identify browser abnormalities that bots exhibit while attempting to pass those tests.</p><p>For over one year, <a href="/end-cloudflare-captcha/">we used our Managed Challenge</a> to rotate between CAPTCHAs and our own Turnstile challenge to compare our effectiveness. We found that <b>even without asking users for any interactivity at all</b>, Turnstile was just as effective as a CAPTCHA. Once we were sure that the results were effective at coping with the response from bot makers, we replaced the CAPTCHA challenge with our own checkbox solution. We present this extra test when we see potentially suspicious signals, and it helps us provide an even greater layer of security.</p>
    <div>
      <h3>Turnstile is great for fighting fraud</h3>
      <a href="#turnstile-is-great-for-fighting-fraud">
        
      </a>
    </div>
    <p>Like all sites that offer services for free, Cloudflare sees our fair share of automated account signups, which can include “new account fraud,” where bad actors automate the creation of many different accounts to abuse our platform. To help combat this abuse, we’ve rolled out Turnstile’s invisible mode to protect our own signup page. This month, we’ve blocked <b>over</b> <b>1 million automated signup attempts</b> using Turnstile, without a reported false positive or any change in our self-service billings that rely on this signup flow.  </p>
    <div>
      <h3>Lessons from the Turnstile beta</h3>
      <a href="#lessons-from-the-turnstile-beta">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6Fiihb5s0WfTPdtLrqx4ro/7b93471efb6a16ba777e5249405ee726/image5-11.png" />
            
            </figure><p>Over the past twelve months, we’ve been grateful to see how many people are eager to try, then rely on, and integrate Turnstile into their web applications. It’s been rewarding to see the developer community embrace Turnstile as well. We list some of the community created Turnstile integrations <a href="https://developers.cloudflare.com/turnstile/community-resources/">here</a>, including integrations with <a href="https://www.cloudflare.com/integrations/wordpress/">WordPress</a>, Angular, Vue, and a Cloudflare recommended <a href="https://www.npmjs.com/package/@marsidev/react-turnstile">React library</a>. We’ve listened to customer feedback, and added support for <a href="https://developers.cloudflare.com/turnstile/reference/supported-languages/">17 new languages</a>, <a href="https://developers.cloudflare.com/turnstile/get-started/client-side-rendering/">new callbacks</a>, and <a href="https://developers.cloudflare.com/turnstile/reference/client-side-errors/">new error codes</a>.</p><p>76,000+ users have signed up, but our biggest single test by far was the <a href="/how-cloudflare-scaled-and-protected-eurovision-2023-voting/">Eurovision final vote</a>. Turnstile runs on challenge pages on over 25 million Cloudflare websites. Usually, that makes Cloudflare the far and away biggest Turnstile consumer, until the final Eurovision vote. During that one hour, challenge traffic from the Eurovision voting site outpaced the use of challenge pages on those 25 million sites combined! Turnstile handled the enormous spike in traffic without a hitch.</p><p>While a lot went well during the Turnstile beta, we also encountered some opportunities for us to learn. We were initially resistant to disclosing why a Turnstile challenge failed. After all, if bad actors know what we’re looking for, it becomes easier for bots to fool our challenges until we introduce new detections. However, during the Turnstile beta, we saw a few scenarios where legitimate users could not pass a challenge. These scenarios made it clear to us that we need to be transparent about why a challenge failed to help aid any individual who might modify their browser in a way that causes them to get caught by Turnstile. We now publish detailed client-side error codes to surface the reason why a challenge has failed. Two scenarios came up on several occasions that we didn’t expect:</p><p>First, we saw that desktop computers at least 10 years old frequently had expired motherboard batteries, and computers with bad motherboard batteries very often keep inaccurate time. This is because without the motherboard battery, a desktop computer’s clock will stop operating when the computer is off. Turnstile checks your computer’s system time to detect when a website operator has accidentally configured a challenge page to be cached, as caching a challenge page will cause it to become impassable. Unfortunately, this same check was unintentionally catching humans who just needed to update the time. When we see this issue, we now surface a clear error message to the end user to update their system time. We’d prefer to never have to surface an error in the first place, so we’re working to develop new ways to check for cached content that won’t impact real people.</p><p>Second, we find that a few privacy-focused users often ask their browsers to go beyond standard practices to preserve their anonymity. This includes changing their user-agent (something bots will do to evade detection as well), and preventing third-party scripts from executing entirely. Issues caused by this behavior can now be displayed clearly in a Turnstile widget, so those users can immediately understand the issue and make a conscientious choice about whether they want to allow their browser to pass a challenge.</p><p>Although we have some of the most sensitive, thoroughly built monitoring systems at Cloudflare, we did not catch either of these issues on our own. We needed to talk to users affected by the issue to help us understand what the problem was. Going forward, we want to make sure we always have that direct line of communication open. We’re rolling out a new feedback form in the Turnstile widget, to ensure any future corner cases are addressed quickly and with urgency.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/cydzYwhoIVTnaMCPmrYVV/f7ff6163cf69dee1abe00f7b5421cd8f/Screenshot-2023-09-29-at-11.37.58.png" />
            
            </figure>
    <div>
      <h3>Turnstile: GA and Free for Everyone</h3>
      <a href="#turnstile-ga-and-free-for-everyone">
        
      </a>
    </div>
    <p>Announcing Turnstile’s General Availability means that Turnstile is now completely production ready, available for free for unlimited use via our visible widget in Managed mode. Turnstile Enterprise includes SaaS platform support and a visible mode without the Cloudflare logo. Self-serve customers can expect a pay-as-you-go option for advanced features to be available in early 2024. Users can continue to access Turnstile’s advanced features below our 1 million siteverify request limit, as has been the case during the beta. If you’ve been waiting to try Turnstile, head over to our <a href="https://www.cloudflare.com/products/turnstile/">signup page</a> and create an account!</p> ]]></content:encoded>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[Turnstile]]></category>
            <category><![CDATA[CAPTCHA]]></category>
            <category><![CDATA[Speed & Reliability]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Bots]]></category>
            <guid isPermaLink="false">3ijPrY6Heu8jsF4JTYQtx6</guid>
            <dc:creator>Benedikt Wolters</dc:creator>
            <dc:creator>Maxime Guerreiro</dc:creator>
            <dc:creator>Adam Martinetti</dc:creator>
        </item>
        <item>
            <title><![CDATA[Super Bot Fight Mode is now configurable!]]></title>
            <link>https://blog.cloudflare.com/configurable-super-bot-fight-mode/</link>
            <pubDate>Thu, 16 Mar 2023 13:00:00 GMT</pubDate>
            <description><![CDATA[ Super Bot Fight Mode can be used with Skip rules now to allow for configurable deployments ]]></description>
            <content:encoded><![CDATA[ <p></p><p>Millions of customers around the world use Cloudflare to keep their applications safe by blocking bot traffic to their website. We block an average of 336 million requests per day for self-service customers using a service called <a href="/super-bot-fight-mode/">Super Bot Fight Mode</a>. It is a crucial part of how customers keep their websites online.</p><p>While most customers use Cloudflare’s <a href="https://radar.cloudflare.com/traffic/verified-bots">Verified Bot</a> directory to securely allow good, automated traffic, some customers also like to write their own localized integration scripts to crawl and update their website, or perform other necessary maintenance functions. Because these bots are only used on a single website, they don’t fit our verified bot criteria the way a Google or Bing crawler does. This makes Super Bot Fight Mode difficult to manage for these types of customers.</p>
    <div>
      <h3>Super Bot Fight Mode: now configurable!</h3>
      <a href="#super-bot-fight-mode-now-configurable">
        
      </a>
    </div>
    <p>Previously, Super Bot Fight Mode ran as an independent service on our global network and other <a href="https://www.cloudflare.com/security/">Cloudflare security services</a> were unable to affect its configuration. To solve this, we’ve rewritten Super Bot Fight Mode behind the scenes. It’s now a new <a href="https://developers.cloudflare.com/waf/managed-rules/">managed ruleset</a> in <a href="/new-cloudflare-waf/">the new WAF</a>, just like the OWASP Core Ruleset or the Cloudflare Managed Ruleset. This doesn’t change the interface, but brings Super Bot Fight Mode closer to where customers are managing their other security exceptions.</p><p>As we speak, the WAF team is carefully <a href="https://developers.cloudflare.com/waf/reference/migration-guides/firewall-rules-to-custom-rules/">migrating all self-serve customers from our old Firewall Rules system to a new system</a>. This new system, called Custom Rules, simplifies the exception process in the rules you write with no other changes or loss of functionality. In the old system we had two separate actions, “allow” and “bypass”. In the new Custom Rules, there’s only one action called “skip”. Rules that “skip” traffic can skip the rest of your custom rules (just like an “allow” rule would) <i>and</i> other Cloudflare services. As Cloudflare customers are given the “Skip” action, you will be able to see the option available to “skip” Super Bot Fight Mode. Here’s an example:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5NAxkGqQh2afrqcmqdKKAd/0b39961bf89323382c5a4544a9a7a9bf/image3-26.png" />
            
            </figure><p>While we spoke to customers about their use cases for skipping Super Bot Fight Mode, one use-case kept popping up that didn’t quite fit the rest: WordPress Loopback requests. As many people know, as part of WordPress’ self-diagnostic capabilities, a WordPress site will make automated requests back to itself over the Internet to confirm its reachability and functionality. These loopback diagnostics can come from dozens of different community developed plugins, each implementing loopback requests slightly differently. To help accommodate an ever-growing diversity in diagnostic tools used in WordPress, we have added a simple configuration option to securely allow these loop-back requests.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1wELIm0qRkZ8VxlRnOa6PK/364901a384a8e8967ac0c9888cf380c1/image2-19.png" />
            
            </figure><p>In the future, we will be integrating this feature with the Cloudflare WordPress plugin to make it even easier to use WordPress with Cloudflare.</p>
    <div>
      <h3>What’s next?</h3>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>Self-serve customers with Custom Rules can create “Skip” rules to create exceptions for Super Bot Fight Mode today. We are currently rolling out Custom Rules to all of our customers. If you do not see this option available now, you should expect to see it in the next several weeks. If the lack of flexibility has prevented you from using Super Bot Fight Mode in the past, please log into the Cloudflare dashboard and try it with these new skip rules!</p><p>While we’ve added flexibility to customers’ Super Bot Fight Mode deployments, we know that Free plan customers want the same level of customization that self-serve customers do. Now that our migration of Super Bot Fight Mode to the new <a href="https://www.cloudflare.com/learning/ddos/glossary/web-application-firewall-waf/">WAF</a> is complete, we plan to do the same for the original Bot Fight Mode to allow more free customers than ever before to join us in the fight against bots.</p> ]]></content:encoded>
            <category><![CDATA[Security Week]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[Bot Fight Mode]]></category>
            <guid isPermaLink="false">7eVoKVVwvc2sxvmuAfkQzg</guid>
            <dc:creator>Adam Martinetti</dc:creator>
        </item>
        <item>
            <title><![CDATA[IBM Cloud works with Cloudflare to help clients modernize and deliver secured cloud infrastructure]]></title>
            <link>https://blog.cloudflare.com/ibm-keyless-bots/</link>
            <pubDate>Thu, 16 Mar 2023 13:00:00 GMT</pubDate>
            <description><![CDATA[ IBM and Cloudflare continue to partner together to help customers meet the unique security, performance, resiliency and compliance needs of their customers through the addition of exciting new product and service offerings. ]]></description>
            <content:encoded><![CDATA[ <p></p><p>In this blog post, we wanted to highlight some ways that Cloudflare and IBM Cloud work together to help drive product innovation and deliver services that address the needs of our mutual customers. On our blog, we often discuss exciting new product developments and how we are solving real-world problems in our effort to make the internet better and many of our customers and partners play an important role.</p><p>IBM Cloud and Cloudflare have been working together since 2018 to integrate Cloudflare <a href="https://www.cloudflare.com/application-services/solutions/">application security and performance products</a> natively into IBM Cloud. <a href="https://www.ibm.com/cloud/cloud-internet-services">IBM Cloud Internet Services</a> (CIS) has customers across a wide range of industry verticals and geographic regions but they also have several specialist groups building unique service offerings.</p><p>The IBM Cloud team specializes in serving clients in highly regulated industries, aiming to ensure their resiliency, performance, security and compliance needs are met. One group that we’ve been working with recently is IBM Cloud for Financial Services. This group extends the capabilities of IBM Cloud to help serve the complex security and compliance needs of banks, financial institutions and fintech companies.</p>
    <div>
      <h3>Bot Management</h3>
      <a href="#bot-management">
        
      </a>
    </div>
    <p>As malicious <a href="https://www.cloudflare.com/learning/bots/what-is-a-bot-attack/">bot attacks</a> get more sophisticated and manual mitigations become more onerous, a dynamic and adaptive solution is required for enterprises running Internet facing workloads. With Cloudflare Bot Management on IBM Cloud Internet Services, we aim to help IBM clients protect their Internet properties from targeted application abuse such as <a href="https://www.cloudflare.com/zero-trust/solutions/account-takeover-prevention/">account takeover attacks</a>, inventory hoarding, carding abuse and more. Bot Management will be available in the second quarter of 2023.</p><p>Threat actors specifically target financial services entities with Account Takeover Attacks, and this is where Cloudflare can help. As much as 71% of login requests we see come from bots (Source: <a href="/grinch-bot/">Cloudflare Data</a>) Cloudflare’s Bot Management is powered by a global <a href="https://www.cloudflare.com/learning/ai/what-is-machine-learning/">machine learning model</a> that analyses an average of 45 million HTTP requests a second to track botnets across our network. Cloudflare’s Bot Management solution has the potential to benefit all IBM CIS customers.</p>
    <div>
      <h3>Supporting banks, financial institutions, and fintechs</h3>
      <a href="#supporting-banks-financial-institutions-and-fintechs">
        
      </a>
    </div>
    <p>IBM Cloud has been a leader when it comes to providing solutions for the financial services industry and has developed several key management solutions that are designed so clients only need to store their private keys in custom built devices.</p><p>The IBM CIS team wants to incorporate the right mix of security and performance, which necessitates the use of cloud-based DDoS, <a href="https://www.cloudflare.com/learning/ddos/glossary/web-application-firewall-waf/">WAF</a>, and Bot Management. Specifically, they wanted to incorporate the powerful security tools that were offered through IBM’s Enterprise-level Cloud Internet Services offerings. When using a cloud solution, it is necessary to proxy traffic which can create a potential challenge when it comes to managing private keys. While Cloudflare adopts strict controls to protect these keys, organizations in highly regulated industries may have security policies and compliance requirements that prevent them from sharing these private keys.</p><p>Enter <a href="/keyless-ssl-the-nitty-gritty-technical-details/">Cloudflare’s Keyless SSL solution</a>.</p><p>Cloudflare built Keyless SSL to allow customers to have total control over exactly where private keys are stored. With Keyless SSL and IBM’s key storage solutions, we aim to help enterprises benefit from the robust application protections available through Cloudflare’s WAF, including Cloudflare Bot Management, while still retaining control of their private keys.</p><blockquote><p><i>“We aim to ensure our clients meet their resiliency, performance, security and compliance needs. The introduction of Keyless SSL and Bot Management security capabilities can further our collaborative accomplishments with Cloudflare and help enterprises, including those in regulated industries, to leverage cloud-native security and adaptive threat mitigation tools.”</i><i>— </i><b><i>Zane Adam</i></b><i>, Vice President, IBM Cloud.</i></p></blockquote><blockquote><p><i>“Through our collaboration with IBM Cloud Internet Services, we get to draw on the knowledge and experience of IBM teams, such as the IBM Cloud for Financial Services team, and combine it with our incredible ability to innovate, resulting in exciting new product and service offerings.”</i><i>—</i> <b>David McClure</b>, Global Alliance Manager, Strategic Partnerships</p></blockquote><p>If you want to learn more about how IBM leverages Cloudflare to protect their customers, visit: <a href="https://www.ibm.com/cloud/cloudflare">https://www.ibm.com/cloud/cloudflare</a></p><p>IBM experts are <a href="https://www.ibm.com/cloud?utm_content=SRCWW&amp;p1=Search&amp;p4=43700075067849398&amp;p5=e&amp;gclid=Cj0KCQiAx6ugBhCcARIsAGNmMbi0kbqZJmVOKlzikUuIAxGVvUJP_X-aH7wgt1NeIwqDzL1F392U-UYaAob3EALw_wcB&amp;gclsrc=aw.ds">here</a> to help you if you have any additional questions.</p> ]]></content:encoded>
            <category><![CDATA[Security Week]]></category>
            <category><![CDATA[IBM]]></category>
            <category><![CDATA[Partners]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[Bot Management]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Fraud]]></category>
            <guid isPermaLink="false">8N7YOG8qpViyOTf5N04JJ</guid>
            <dc:creator>David McClure</dc:creator>
            <dc:creator>Adam Martinetti</dc:creator>
        </item>
        <item>
            <title><![CDATA[Announcing Cloudflare Fraud Detection]]></title>
            <link>https://blog.cloudflare.com/cloudflare-fraud-detection/</link>
            <pubDate>Wed, 15 Mar 2023 13:05:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare Fraud Detection uses machine learning models to better protect businesses from fake account signups, account takeover attacks, and carding attacks to ensure businesses can operate online safely ]]></description>
            <content:encoded><![CDATA[ <p></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1hVkayGudspBR3CdMprI5C/31996f5e1db90507e0c9ff6df0b70e68/image1-23.png" />
            
            </figure><p>The world changed when the COVID-19 pandemic began. Everything moved online to a much greater degree: school, work, and, surprisingly, fraud. Although some degree of online fraud has existed for decades, the Federal Trade Commission reported consumers <a href="https://www.ftc.gov/news-events/news/press-releases/2023/02/new-ftc-data-show-consumers-reported-losing-nearly-88-billion-scams-2022">lost almost $8.8 billion in fraud in 2022</a> (an <a href="https://www.ftc.gov/system/files/documents/reports/consumer-sentinel-network-data-book-2019/consumer_sentinel_network_data_book_2019.pdf">over 400% increase since 2019</a>) and the continuation of a disturbing trend. People continue to <a href="https://www.washingtonpost.com/opinions/2022/11/23/americans-alone-thanksgiving-friends/">spend more time alone</a> than ever before, and that time alone makes them not just more targeted, but also <a href="https://www.nature.com/articles/s41599-022-01445-5">more vulnerable to fraud</a>. Companies are falling victim to these trends just as much as individuals: according to PWC’s Global Economic Crime and Fraud Survey, <a href="https://www.pwc.com/gx/en/services/forensics/economic-crime-survey.html?cf_target_id=AF857F14B27E73443176A59CFB4F60C7">more than half of companies with at least $10 billion in revenue experienced some sort of digital fraud</a>.</p><p>This is a familiar story in the world of <a href="https://www.cloudflare.com/learning/bots/what-is-a-bot-attack/">bot attacks</a>. Cloudflare Bot Management helps customers identify the automated tools behind online fraud, but it’s important to note that <i>not all fraud is committed by bots</i>. If the target is valuable enough, bad actors will <a href="https://www.forbes.com/sites/augustinefou/2020/09/19/you-didnt-know-bots-solve-captchas-and-do-pharming-with-humans/?sh=15c47291a4d8">contract out the exploitation of online applications to real people</a>. Security teams need to look at more than just bots to <a href="https://www.cloudflare.com/application-services/solutions/">better secure online applications</a> and tackle modern, online fraud.</p><p>Today, we're excited to announce Cloudflare Fraud Detection. Fraud Detection will give you precise, easy to use tools that can be deployed in seconds to any website on the Cloudflare network to help detect and categorize fraud. For every type of fraud we detect on your website, you will be able to choose the behavior that makes the most sense to you. While some customers will want to block fraudulent traffic at our edge, other customers may want to pass this information in headers to build integrations with their own app, or use our Cloudflare Workers platform to direct high risk users to load an alternate online experience with fewer capabilities.</p>
    <div>
      <h3>The online fraud experience today</h3>
      <a href="#the-online-fraud-experience-today">
        
      </a>
    </div>
    <p>When we talk to organizations impacted by sophisticated, online fraud, the first thing we hear from frustrated security teams is that they <i>know</i> what they could do to stop fraud in a vacuum: they’ve proposed requiring email verification on signup, enforcing two-factor authentication for all logins, or blocking online purchases from anonymizing VPNs or countries they repeatedly see a disproportionately high number of charge-backs from. While all of these measures would undoubtedly reduce fraud, they would also make the user experience worse. The fear for every company is that a bad UX will mean slower adoption and less revenue, and that’s too steep a price to pay for most run-of-the-mill online fraud.</p><p>For those who’ve chosen to preserve that frictionless user experience and bear the cost of fraud, we see two big impacts: <b><i>higher infrastructure costs and less efficient employees</i></b>. Bad actors that abuse account creation endpoints or service availability endpoints often do so with floods of highly distributed HTTP requests, quickly moving through residential proxies to pass under IP based rate limiting rules. Without a way to identify fraudulent traffic with certainty, companies are forced to scale up their infrastructure to be able to serve new peaks in request traffic, even when they know the majority of this traffic is illegitimate. Engineering and Trust and Safety Teams suddenly have a whole new set of responsibilities: regularly banning IP addresses that will probably never be used again, routinely purging fraudulent data from over capacity databases, and even sometimes becoming de-facto fraud investigators. As a result, the organization incurs greater costs without any greater value to their customers.</p>
    <div>
      <h3>Reduce modern fraud without hurting UX</h3>
      <a href="#reduce-modern-fraud-without-hurting-ux">
        
      </a>
    </div>
    <p>Organizations have told us loud and clear that an effective fraud management solution needs to reliably stop bad actors before they can create fraudulent accounts, use stolen credit cards, or steal customer data all the while ensuring a frictionless user experience for real users. We are building novel and highly accurate detections, solving for the four common fraud types we hear the most demand for from businesses around the world:</p><ul><li><p><b>Fake Account Creation</b>: Bad actors signing up for many different accounts to gain access to promotional rewards, or more resources than a single user should have access to.</p></li><li><p><b>Account Takeover</b>: Gaining unauthorized access to legitimate accounts, by means such as using stolen username and password combinations from other websites, guessing weak passwords, or abusing account recovery mechanisms.</p></li><li><p><b>Card Testing and Fraudulent Transactions:</b> Testing the validity of stolen credit card details or using those same details to purchase goods or services.</p></li><li><p><b>Expediting:</b> Obtaining limited availability goods or services by circumventing the normal user flow to complete orders more quickly than should be possible.</p></li></ul><p>In order to trust your fraud management solution, organizations have to understand the decisions or predictions behind the detection of fraud. This is referred to as explainability. For example, it’s not enough to know a signup attempt was flagged as fraud. You need to know, for example, if a signup is fraudulent, exactly what field supplied by the user led us to think this was an issue, why it was an issue, and if it was part of a larger pattern. We will pass along this level of detail when we detect fraud so you can ensure we are only keeping the bad actors out.</p><p>Every business that deals with modern, online fraud has a different idea of what risks are acceptable, and a different preference for dealing with fraud once it’s been identified. To give customers maximum flexibility, we’re building Cloudflare’s fraud detection signals to be used individually, or combined with other <a href="https://www.cloudflare.com/security/">Cloudflare security products</a> in whichever way best fits each customer’s risk profile and use case, all while using the familiar Cloudflare Firewall Rules interface. Templated rules and suggestions will be available to provide guidance and help customers become familiar with the new features, but each customer will have the option of fully customizing how they want to protect each internet application. Customers can either block, rate-limit, or challenge requests at the edge, or send those signals upstream in request headers, to trigger custom in-application behavior.</p><p>Cloudflare provides application performance and security services to millions of sites, and we see 45 million HTTP requests per second on average. The massive diversity and volume of this traffic puts us in a unique position to analyze and defeat online fraud. Cloudflare Bot Management is already built to run our Machine Learning model that detects automated traffic on every request we see. To better tackle more challenging use cases like online fraud, we made our lightning fast Machine Learning even more performant. The typical Machine Learning model now executes in under 0.2 milliseconds, giving us the architecture we need to run multiple specific Machine Learning models in parallel without slowing down content delivery.</p>
    <div>
      <h3>Stopping fake account creation and adding to Cloudflare’s defense in depth</h3>
      <a href="#stopping-fake-account-creation-and-adding-to-cloudflares-defense-in-depth">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6UB0fiyeOu64IjyLtbif7A/dd9d5bd81c9d63b92ecd60624d043987/image3-21.png" />
            
            </figure><p>The first problem our customers asked us to tackle is detecting fake account creation. Cloudflare is perfectly positioned to solve this because we see more account creation pages than anyone else. Using sampled fake account attack data from our customers, we started looking at signup submission data, and how threat intelligence curated by our <a href="/introducing-cloudforce-one-threat-operations-and-threat-research/">Cloudforce One</a> team might be helpful. We found that the data used in our <a href="https://www.cloudflare.com/cloudflare-one/">Cloudflare One products</a> was already able to identify 72% of fake accounts based on the signup details supplied by the bad actor, such as the email address or the domain they’re using in the attack. We are continuing to add more sources of threat intelligence data specific to fake accounts to get this number close to 100%. On top of these threat intelligence based rules, we are also training new machine learning models on this data as well, that will spot trends like popular fraud domains based on intelligence from the millions of domains we see across the Cloudflare network.</p>
    <div>
      <h3>Making fraud inefficient by expediting detection</h3>
      <a href="#making-fraud-inefficient-by-expediting-detection">
        
      </a>
    </div>
    <p>The second problem customers asked us to prioritize is expediting. As a reminder, expediting means visiting a succession of web pages faster than would be possible for a normal user, and sometimes skipping ahead in the order of web pages in order to efficiently exploit a resource.</p><p>For instance, let’s say that you have an Account Recovery page that is being spammed by a sophisticated group of bad actors, looking for vulnerable users they can steal reset tokens for. In this case, the fraudsters have access to a large number of valid email addresses and they’re testing which of these addresses may be used at your website. To prevent your account recovery process from being abused, we need to ensure that no single person can move through the account recovery process faster, or in a different order than a real person would.</p><p>In order to complete a valid password reset action on your site, you may know that a user should have made:</p><ul><li><p>A GET request to render your login page</p></li><li><p>A POST request to the login page (at least one second after receiving the login page HTML)</p></li><li><p>A GET request to render the Account Recovery page (at least one second after receiving the POST response)</p></li><li><p>A POST request to the password reset page (at least one second after receiving the Account Recovery page HTML)</p></li><li><p>Taken a total time of less than 5 seconds to complete the process</p></li></ul><p>To solve this, we will rely on encrypted data stored by the user in a token to help us determine if the user has visited all the necessary pages needed in a reasonable amount of time to be performing sensitive actions on your site. If your account recovery process is being abused, the encrypted token we supply acts as a VIP pass, allowing only authorized users to successfully complete the password recovery process. Without a pass indicating the user has gone through the normal recovery flow in the correct order and time, they are denied entry to complete a password recovery. By forcing the bad actor to behave the same as a legitimate user, we make their task of checking which of their compromised email addresses might be registered at your site an impossibly slow process, forcing them to move on to other targets.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3XLE40OlE3zuriiGqwuVLC/0716f302220ef0f401e3bab2434fdd4a/image4-10.png" />
            
            </figure><p>These are just two of the first techniques we use to identify and block fraud. We are also building <a href="https://www.cloudflare.com/zero-trust/solutions/account-takeover-prevention/">Account Takeover</a> and Carding Abuse detections that we will be talking about in the future on this blog. As online fraud continues to evolve, we will continue to build new and unique detections, leveraging Cloudflare’s unique position to help keep the internet safe.</p>
    <div>
      <h3>Where do I sign up?</h3>
      <a href="#where-do-i-sign-up">
        
      </a>
    </div>
    <p>Cloudflare’s mission is to help build a better Internet, and that includes dealing with the evolution of modern online fraud. If you’re spending hours cleaning up after fraud, or are tired of paying to serve web traffic to bad actors, you can join in the Cloudflare Fraud Detection Early Access in the second half of 2023 by <a href="http://cloudflare.com/lp/fraud-detection">submitting your contact information here</a>. Early Access customers can opt in to providing training data sets right away, making our models more effective for their use cases. You’ll also get test access to our newest models, and future fraud protection features as soon as they roll out.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4PQsBgqrqiyq45qFanzWOB/33a5a20727c8c54c91a3ac1823984e8f/image2-13.png" />
            
            </figure><p></p> ]]></content:encoded>
            <category><![CDATA[Security Week]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[Bot Management]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Fraud]]></category>
            <guid isPermaLink="false">2zgk0zr4VHgU6iLm90E0jJ</guid>
            <dc:creator>Adam Martinetti</dc:creator>
        </item>
        <item>
            <title><![CDATA[More bots, more trees]]></title>
            <link>https://blog.cloudflare.com/more-bots-more-trees/</link>
            <pubDate>Wed, 14 Dec 2022 14:00:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare’s Bot Fight Mode caught 6x more bots in 2022, and we’re contributing to a new tree planting project in West Bengal. ]]></description>
            <content:encoded><![CDATA[ <p><i></i></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2SFWx66fKc0jXFhGX6WURT/51fe4c5a01f880180c52c7c2be9882cc/image2-27.png" />
            
            </figure><p>Once a year, we pull data from our Bot Fight Mode to determine the number of trees we can donate to our partners at One Tree Planted. It's part of the <a href="/cleaning-up-bad-bots/">commitment</a> we made in 2019 to deter malicious bots online by redirecting them to a challenge page that requires them to perform computationally intensive, but meaningless tasks. While we use these tasks to drive up the bill for bot operators, we account for the carbon cost by planting trees.</p><p>This year when we pulled the numbers, we saw something exciting. While the number of bot detections has gone significantly up, the time bots spend in the Bot Fight Mode challenge page has gone way down. We’ve observed that bot operators are giving up quickly, and moving on to other, unprotected targets. Bot Fight Mode is getting smarter at detecting bots and more efficient at deterring bot operators, and that’s a win for Cloudflare and the environment.</p>
    <div>
      <h3>What’s changed?</h3>
      <a href="#whats-changed">
        
      </a>
    </div>
    <p>We’ve seen two changes this year in the Bot Fight Mode results. First, the time attackers spend in Bot Fight Mode challenges has reduced by 166%. Many bot operators are disconnecting almost immediately now from Cloudflare challenge pages. We expect this is because they’ve noticed the sharp cost increase associated with our CPU intensive challenge and given up. Even though we’re seeing individual bot operators give up quickly, Bot Fight Mode is busier than ever. We’re issuing six times more CPU intensive challenges per day compared to last year, thanks to a new detection system written using Cloudflare’s ruleset engine, detailed below.</p>
    <div>
      <h3>How did we do this?</h3>
      <a href="#how-did-we-do-this">
        
      </a>
    </div>
    <p>When Bot Fight Mode launched, we highlighted one of our core detection systems:</p><blockquote><p><i>“Handwritten rules for simple bots that, however simple, get used day in, day out.”</i></p></blockquote><p>Some of them are still very simple. We introduce new simple rules regularly when we detect new software libraries as they start to source a significant amount of traffic. However, we started to reach the limitations of this system. We knew there were sophisticated bots out there that we could identify easily, but they shared enough overlapping traits with good browser traffic that we couldn’t safely deploy new rules to block them safely without potentially impacting our customers’ good traffic as well.</p><p>To solve this problem, we built a new rules system written on the same highly performant Ruleset Engine that powers <a href="/new-waf-experience/">the new WAF</a>, <a href="/transform-http-response-headers/">Transform Rules</a>, and <a href="/introducing-cache-rules/">Cache Rules</a>, rather than the old <a href="/cloudflare-bot-management-machine-learning-and-more/">Gagarin heuristics engine</a> that was fast but inflexible. This new framework gives us the flexibility we need to write highly complex rules to catch more elusive bots without the risk of interfering with legitimate traffic. The data gathered by these new detections are then labeled and used to train our <a href="/machine-learning-mobile-traffic-bots/">Machine Learning engine</a>, ensuring we will continue to catch these bots as their operators attempt to adapt.</p>
    <div>
      <h3>What’s next?</h3>
      <a href="#whats-next">
        
      </a>
    </div>
    <p>We’ve heard from Bot Fight Mode customers that they need more flexibility. Website operators now expect a significant percentage of their legitimate traffic to come from automated sources, like service to service APIs. These customers are waiting to enable Bot Fight Mode until they can tell us what parts of their website it can run on safely. In 2023, we will give everyone the ability to write their own flexible Bot Fight Mode rules, so that every Cloudflare customer can join the fight against bots!</p>
    <div>
      <h3>Update: Mangroves, Climate Change &amp; economic development</h3>
      <a href="#update-mangroves-climate-change-economic-development">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2eTGp8yd0bXgNsWWZ9VOPm/74ea0d252310c699ef3909e01ed6b4e0/image1-29.png" />
            
            </figure><p>Source: One Tree Planted</p><p>We're also pleased to report the second tree planting project from our 2021 bot activity is now complete! Earlier this year, Cloudflare <a href="/35-000-new-trees-in-nova-scotia/">contributed</a> 25,000 trees to a restoration project at Victoria Park in Nova Scotia.</p><p>For our second project, we donated 10,000 trees to a much larger restoration project on the eastern shoreline of Kumirmari island in the Sundarbans of West Bengal, India. In total, the project included more than 415,000 trees along 7.74 hectares of land in areas that have been degraded or deforested. The types of trees planted included Bain, Avicennia officianalis, Kalo Bain, and eight others.</p><p>The Sundarbans are located on the delta of the Ganges, Brahmaptura, and Meghna rivers on the Bay of Bengal, and are home to one of the world's largest mangrove forests. The forest is not only a <a href="https://whc.unesco.org/en/list/798/">UNESCO World Heritage</a> site, but also home to 260 bird species as well as a number of threatened species like the Bengal tiger, the estuarine crocodile, and Indian python. According to <a href="https://onetreeplanted.org/">One Tree Planted</a>, the Sundarbans are currently under threat from rising sea levels, increasing salinity in the water and soil, cyclonic storms, and flooding.</p><p>The Intergovernmental Panel on Climate Change (IPCC) has found that mangroves are critical to mitigating greenhouse gas (GHG) emissions and protecting coastal communities from extreme weather events caused by climate change. The Sundarbans mangrove forest is one of the world's largest carbon sinks (an area that absorbs more carbon than it emits). One <a href="https://www.nature.com/articles/s41598-022-11716-5#:~:text=Recent%20researchers%20have%20evaluated%20that,0.5%E2%80%933%20m%20depth17.">study</a> suggested that coastal mangrove forests sequester carbon at a rate of two to four times that of a mature tropical or subtropical forest region.</p><p>One of the most exciting parts of this project was its focus on hiring and empowering local women. According to One Tree Planted, 75 percent of those involved in the project were women, including 85 women employed to monitor and manage the planting site over a five-month period. Participants also received training in the seed collection process with the goal of helping local residents lead mangrove planting from start to finish in the future.</p>
    <div>
      <h3>More bots stopped, more trees planted!</h3>
      <a href="#more-bots-stopped-more-trees-planted">
        
      </a>
    </div>
    <p>Thanks to every Cloudflare customer who’s enabled Bot Fight Mode so far. You’ve helped make the Internet a better place by stopping malicious bots, and you’ve helped make the planet a better place by reforesting the Earth on bot operators’ dime. The more domains that use Bot Fight Mode, the more trees we can plant, so <a href="https://dash.cloudflare.com/signup">sign up for Cloudflare</a> and <a href="https://developers.cloudflare.com/bots/get-started/free/#enable-bot-fight-mode">activate Bot Fight Mode</a> today!</p> ]]></content:encoded>
            <category><![CDATA[Impact Week]]></category>
            <category><![CDATA[Sustainability]]></category>
            <category><![CDATA[Bots]]></category>
            <category><![CDATA[Bot Fight Mode]]></category>
            <category><![CDATA[Policy & Legal]]></category>
            <guid isPermaLink="false">6iUGyPq3gBlCtJErYjEyM2</guid>
            <dc:creator>Adam Martinetti</dc:creator>
            <dc:creator>Patrick Day</dc:creator>
        </item>
    </channel>
</rss>