This post is also available in 简体中文 and 日本語.
When someone mentions bots on the Internet, what’s your first reaction?
It’s probably negative. Most of us conjure up memories of CAPTCHAs, stolen passwords, or some other pain caused by bad bots.
But the truth is, there are plenty of well-behaved bots on the Internet. These include Google’s search crawler and Stripe’s payment bot. At Cloudflare, we manually “verify” good bots, so they don’t get blocked. Our customers can choose to allowlist any bot that is verified. Unfortunately, new bots are popping up faster than we can verify them. So today we’re announcing a solution: Friendly Bots.
Let’s begin with some background.
How does a bot get verified?
We often find good bots via our public form. Anyone can submit a bot, but we prefer that bot operators complete the form to provide us with the information we need. We ask for some standard bits of information: your bot’s name, its public documentation, and its user agent (or regex). Then, we ask for information that will help us validate your bot. There are four common methods:
Send us a list of IP addresses used by your bot. This doesn’t have to be a static list — you can give us a dynamic page that changes — just provide us with the URL, and we’ll fetch updates every day. These IPs must be publicly documented and exclusive to your bot. If you provide a shared IP address (like one used by a proxy service), our systems will detect risk and refuse to cooperate. We want to avoid accidentally allowing other traffic.
This one is fun. You’ve heard of DNS: the phone book of the Internet, which helps map domain names to IP addresses. rDNS works in the reverse, allowing us to take an IP address and deduce the domain name associated with it.
In other words: give us a hostname suffix, and in many cases we’ll be able to validate your bot’s identity!
User agent + ASN validation
In some cases, we can verify bots that consistently come from the same network (known as an “ASN”) with the same user agent. Note that we can’t always do this — traffic becomes easier to spoof — but we’re often confident enough to use this as a validation method.
This is the most flashy method. Cloudflare sees 32+ million requests every second, and we’ve been able to feed those requests into a model that can accurately profile good bots. If the previous validation methods don’t work for you, there’s a good chance we can use ML to spot your bot. But we need enough traffic (thousands of requests) to detect a usable pattern.
We usually approve Verified Bot requests within a few weeks, after taking some time to quality test and ensure everything is safe. But as mentioned before, we often have to reserve this process for trusted partners and larger bots, even though plenty of our users still need their bots allowlisted.
What if my bot isn’t a huge global service?
We keep our ears open (and our eyes on Twitter), so we know that folks want their own “personal” version of Verified Bots.
For example: let’s say you built your own monitoring service that crawls a few of your personal websites. It doesn’t make sense for us to verify this bot, because it doesn’t meet any of our criteria:
- Serve the broader Internet.
- Objectively demonstrate good behavior.
- Comply with Internet standards like robots.txt.
It’s your bot (and to you, it might be good!), but our other users might feel differently. Imagine if someone else’s bot could waltz into your infrastructure at any time!
Here’s another case. Perhaps Cloudflare has labeled a particular proxy as automated, possibly because a mix of humans and bots use the proxy to access the Internet. You may want to allow this traffic on your site without affecting other Cloudflare customers.
Lastly, if you work at a startup, your company may run automated services that haven’t reached the scale we require. But you still need a way to allowlist these services.
Announcing Friendly Bots
The bots described above, especially common services, are not bad. They deserve to sit in a state between bad and verified. They’re friendly.
And we’ve come up with a really cool way to help you manage them.
Our new feature, Friendly Bots, allows you to instantly auto-validate any traffic with the help of IP lists, rDNS, and more.
Here’s how it works: in the Cloudflare dashboard, tell us about your bot. You can point us toward a public IP list, give us a hostname suffix, or even select other methods like machine learning. Cloudflare’s anycast network allows us to run all of these mechanisms at each one of our data centers. This means you’ll have performant, secure, and scalable bot verification.
Build a collection of Friendly Bots and share them between your sites, creating custom policies that allow, rate limit, or log this type of traffic. You may just want to keep tabs on a particular bot; that’s fine. The response options are flexible and directly integrate with our Workers platform.
In the past, we’ve struggled to verify bots that did not crawl the web at a large scale. Why? Our system relies on a cache of verified traffic, ensuring that certain IPs or other data have widely shown good behavior on the Internet. This means that bots were sometimes difficult to verify if they did not make thousands of requests to Cloudflare. With Friendly Bots, we’ve eliminated that requirement, introducing a new, dynamic cache that optimizes for fun-sized projects.
The downstream benefits
Friendly Bots will streamline your dashboard experience. But there are a few hidden, downstream benefits we want to highlight:
Admittedly, it’s challenging to keep up with all the good bots on the Internet. In order to verify a bot, we’ve relied on manual submissions that may come weeks, or even months after a good bot is created. Friendly Bots will change all of that. If we notice many of our customers allowlisting a particular bot — say, a certain IP address or hostname suffix, our systems will automatically queue that bot for verification. We can intelligently use your Friendly Bots to help the rest of Cloudflare’s customers.
In the past, users have been confused by the verification process. Do I need to provide documentation for my IPs? What about my user agent: can it change over time? If any piece of the validation data was broken, it could take us weeks to identify and fix.
That’s no longer the case. With Friendly Bots, we perform validation almost instantly. So if something isn’t right — perhaps your rDNS validation uses the wrong hostname — you’ll know immediately because the bot won’t be allowlisted. No more waiting to hear from our support team.
Previously, we required bot operators (e.g., Google) to submit verification data themselves. If there was a bot you wanted to verify, but did not own, you were out of luck.
Friendly Bots eliminates this dependency on bot operators. Anyone who can find identifying information can register a bot on their site.
If a scraper shows up to your site, is that a good thing? To some, yes, because it’s exposure. To others, no, because that scraper may take data. This is a question we’ve carefully considered with every Verified Bots submission to date.
Now: it’s your choice to make. Friendly Bots puts the control in your hands, allowing you to categorize bots at a domain level. We’ll continue to verify bots at a global level (when behavior is objectively good).
Here’s a fun bonus: in addition to today’s Friendly Bots announcement, we’re also making some changes to Cloudflare Radar.
Beginning immediately, you can see a list of many Verified Bots in Radar. This is exciting; we’ve never published a detailed list like this before.
All data is updated in real time. As we verify new bots, they will appear here in the Radar module.
We’re also beginning to add specific Verified Bots to our Logs product. You’ll see them as Bot Tags, so a request might include the string “pinterest” if it came from Pinterest’s bot.
Our team is excited to launch Friendly Bots soon. We anticipate the impact will radiate throughout Bot Management, reducing false positives, improving crawl-ability, and generally stabilizing sites.
If you have Bot Management and want to give this new feature a try, please tell your account team (and we’ll be sure to include you in the early access period). You can also continue to tell us about bots that should be verified.