Cleaning up bad bots (and the climate)

From the very beginning Cloudflare has been stopping malicious bots from scraping websites, or misusing APIs. Over time we’ve improved our bot detection methods and deployed large machine learning models that are able to distinguish real traffic (be it from humans or apps) from malicious bots. We’ve also built a large catalog of good bots to detect things like helpful indexing by search engines.

But it’s not enough. Malicious bots continue to be a problem on the Internet and we’ve decided to fight back. From today customers have the option of enabling “bot fight mode” in the Firewall settings of their Cloudflare Dashboard.

Once enabled, when we detect a bad bot, we will do three things: (1) we’re going to disincentivize the bot maker economically by tarpitting them, including requiring them to solve a computationally intensive challenge that will require more of their bot’s CPU; (2) for Bandwidth Alliance partners, we’re going to hand the IP of the bot to the partner and get the bot kicked offline; and (3) we’re going to plant trees to make up for the bot’s carbon cost.

Malicious bots harm legitimate web publishers and applications, hurt hosting providers by misusing resources, and they doubly hurt the planet through the cost of electricity for servers and cooling for their bots and their victims.

Enough is enough. Our goal is nothing short of making it no longer viable to run a malicious bot on the Internet. And we think, with our scale, we can do exactly that.

How Cloudflare Detects Bots

Cloudflare’s secret sauce (ok, not very secret sauce) is our vast scale. We currently handle traffic for over 20 million Internet properties ranging from the smallest personal web sites, through backend APIs for popular apps and IoT devices, to some of the best known names on the Internet (including 10% of the Fortune 1000).

This scale gives us a huge advantage in that we see an enormous amount and variety of traffic allowing us to build large machine learning models of Internet behavior. That scale and variety allows us to test new rules and models quickly and easily.

Our bot detection breaks down into four large components:

Identification of well known legitimate bots;
Hand written rules for simple bots that, however simple, get used day in, day out;
Our Bot Activity Detector model that spots the behavior of bots based on past traffic and blocks them; and
Our Trusted Client model that spots whether an HTTP User-Agent is what it says it is.

In addition, Gatebot, our DDoS mitigation system, fingerprints DDoS bots and blocks their traffic at the packet level. Beyond Gatebot, customers also have access to our Firewall Rules where they can write granular rules to block very specific attack types.

Another model allows us to determine whether an IP address belongs to a VPN endpoint, a home broadband subscriber, a company using NAT or a hosting or cloud provider. It’s this last group that “Bot Cleanup” targets.

Today, Cloudflare challenges over 3 billion bot requests per day. Some of those bots are about to have a really bad time.

How Cloudflare Fights Bots

The cost of launching a bot attack consists of the expense of CPU time that powers the attack. If our models show that the traffic is coming from a bot, and it’s on a hosting or a cloud provider, we’ll deploy CPU intensive code to make the bot writer expend more CPU and slow them down. By forcing the attacker to use more CPU, we increase their costs during an attack and deter future ones.

This is one of the many so-called "tarpitting" techniques we're now deploying across our network to change the economics of running a malicious bot. Malicious bot operators be warned: if you target resources behind Cloudflare's IP space we're going to make you spin your wheels.

Every minute we tie malicious bots up is a minute they're not harming the Internet as a whole. This means we aren't just protecting our customers but everyone online currently terrorized by malicious bots. The spirit of Cloudflare's Birthday Week has always been about giving back to the Internet as a whole, and we can think of no better gift than ridding the Internet of malicious bots.

Beyond just wasting bots time we want to also get them shut down. If the infrastructure provider hosting the bot is part of the Bandwidth Alliance, we’ll share the bot’s IP address so they can shutdown the bot completely. The Bandwidth Alliance allows us to reduce transit costs with partners and, with this launch, also helps us work together with them to make the Internet safer for legitimate users.

Generally, everyone we ran Bot Fight Mode by thought it was a great idea. The only objection we heard was that as we start forcing bots to solve CPU intensive challenges in the short term, before they just give up — which we think is inevitable in the long term — we may raise carbon emissions. To combat those emissions we’re committed to estimating the extra CPU utilized by these bots, calculating their carbon cost, and then planting trees to compensate and build a better future.

Planting Trees

Dealing with climate change requires multiple efforts by people and companies. Cloudflare announced earlier this year that we had expanded our purchasing of Renewable Energy Certificates (that previously covered our North American operations) to our entire global network of 194 cities.

To figure out how much tree planting we need to do we need to calculate the cost of the extra CPU used when making a bot work hard. Here’s how that will work.

Using a figure of 450 kg CO2/year (from https://www.goclimateneutral.org/blog/the-carbon-footprint-of-servers/) for the types of server that a bad bot might use (cloud server using a non-renewable energy source) we get about 8kg CO2/year per CPU core. We are able to measure the time bots spend burning CPU and so we can directly estimate the amount of CO2 emitted by our fight back.

According to One Tree Planted, a single mature tree can absorb about 21kg CO2/year. So, very roughly, each tree can absorb a year’s worth of CO2 from 2.5 CPU cores.

Since trees take time to mature and the scale of the climate change challenge we’re going to pay to overplant trees. For every tree that we calculate we’d need to plant to sequester the CO2 emissions from fighting bots we’re going to donate $25 to One Tree Planted to plant 25 trees.

And, of course, we’ll be handing the IPs of bad bots to our Bandwidth Alliance partners to get the bots shut down and remove their carbon cost completely. In the past, the tech community has largely defeated email spammers and DDoS-for-hire services by making their efforts fruitless, we think this is the right strategy to now defeat malicious bots once and for all.

Who Do Bots Hurt?

Malicious bots can cause significant harm to our customers’ infrastructure and often result in bad experiences for our customers’ users.

For example, a recent customer was being crippled by a credential stuffing attack that not only was attempting to compromise their users’ accounts but was doing so in such significant volume that it was effectively causing a small scale Denial of Service on all aspects of the customer’s website.

The malicious bot was overloading the customer’s conventional threat prevention infrastructure and we rapidly onboarded them as an Under Attack customer. As a part of the onboarding, we identified that the attack could be specifically thwarted using our Bot Management product while not impacting any legitimate user traffic.

Another trend we have seen is the increase of the combination of bots with botnets, particularly in the world of inventory hoarding bots. The motivation and willingness to spend for these bot operators is quite high.

The targets are goods of generally of limited supply and high in demand and in value. Think sneakers, concert tickets, airline seats, and popular short run Broadway musicals. Bot operators who are able to purchase those items at retail can charge massive premiums in aftermarket sales. When the operator identifies a target site, such as an ecommerce retailer, and a specific item, such as a new pair of sneakers going on sale, they can purchase time on the new Residential Proxy as a Service market to gain access to end user machines and (relatively) clean IPs from which to launch their attack.

They then utilize sophisticated techniques and triggers to change characteristics of the machine, network, and software they use to generate the attack through a very wide array of options and combinations, thwarting systems that rely on repetition or known patterns. This type of attack hurts multiple targets as well: the ecommerce site has real frustrated users who can’t purchase the in demand item. The real users who are losing out on inventory to an attacker who is just there to skim off the largest profit possible. And the unwitting users who are part of the botnet have their resources, such as their home broadband connection, used without their consent or knowledge.

The bottom line is that bots hurt companies and their customers.

Summary

Cloudflare has fought malicious bots from the very beginning and over time has deployed more and more sophisticated methods to block them. Using the power of the over 20 million Internet properties we protect and accelerate and our visibility of networks and users around the world we have build machine learning models that sort the bots from the good and block the bad.

But bots continue to be a problem and our new bot fight mode will directly disincentive bot writers from attacking customers. At the same time we don’t want to contribute to climate change and are offsetting the carbon cost of bots by planting trees to absorb carbon and help build a better future (and Internet).

The Cloudflare Blog

Cleaning up bad bots (and the climate)

How Cloudflare Detects Bots

How Cloudflare Fights Bots

Planting Trees

Who Do Bots Hurt?

Summary

Message Signatures are now part of our Verified Bots Program, simplifying bot authentication

Introducing pay per crawl: Enabling content owners to charge AI crawlers for access

From Googlebot to GPTBot: who’s crawling your site in 2025

The crawl before the fall… of referrals: understanding AI’s impact on content providers