Suscríbete para recibir notificaciones de nuevas publicaciones:

From Googlebot to GPTBot: who’s crawling your site in 2025

2025-07-01

10 min de lectura
Esta publicación también está disponible en English.

Web crawlers are not new. The World Wide Web Wanderer debuted in 1993, though the first web search engines to truly use crawlers and indexers were JumpStation and WebCrawler. Crawlers are part of one of the backbones of the Internet’s success: search. Their main purpose has been to index the content of websites across the Internet so that those websites can appear in search engine results and direct users appropriately. In this blog post, we’re analyzing recent trends in web crawling, which now has a crucial and complex new role with the rise of AI.

Not all crawlers are the same. Bots, automated scripts that perform tasks across the Internet, come in many forms: those considered non-threatening or “good” (such as API clients, search indexing bots like Googlebot, or health checkers) and those considered malicious or “bad” (like those used for credential stuffing, spam, or scraping content without permission). In fact, around 30% of global web traffic today, according to Cloudflare Radar data, comes from bots, and even exceeds human Internet traffic in some locations.

A new category, AI crawlers, has emerged in recent years. These bots collect data from across the web to train AI models, improving tools and experiences, but also raising issues around content rights, unauthorized use, and infrastructure overload. We aimed to confirm the growth of both search and AI crawlers, examine specific AI crawlers, and understand broader crawler usage.

This is increasingly relevant with the rapid adoption of AI, growing content rights concerns, and data privacy discussions. Some sites and creators are looking to limit or block AI crawlers using tools like robots.txt or firewall rules. Others, like Dutch indie maker and entrepreneur Pieter Levels, have embraced them: “I’m 100% fine with AI crawlers… very important to rank in LLMs [large language models]”.

It’s important to note that crawlers serve different purposes. For example, the facebookexternalhit bot is not included in this analysis, as it is used by Facebook to fetch page content when generating previews for shared links. However, within this post, we are only focusing on AI and search crawlers that are indexing or scraping website content.

AI-only crawlers perspective

Let’s start with an AI-only crawler perspective that we currently have on Cloudflare Radar, focused only on crawlers advertised as AI-related. To identify them, we’re using here a list derived from an open-source project that helps website owners manage and control access to AI crawlers — especially those used to train large language models (LLMs). It also provides guidance on what to include in robots.txt files (more on that below). The data shown below is based on matching those crawler names with user-agent strings in HTTP requests. (Further details, including one exception, about this method can be found at the end of the blog post.)

The AI crawler landscape saw a significant shift between May 2024 and May 2025, with GPTBot (from OpenAI) emerging as the dominant force, surging from 5% to 30% share, and Meta-ExternalAgent (from Meta) making a strong new entry at 19%. This growth came at the expense of former leader Bytespider, which plummeted from 42% to 7%, as well as other AI crawlers like ClaudeBot and Amazonbot, which also saw declines. Our data clearly indicates a reordering of top AI crawlers, highlighting the increasing prominence of OpenAI and Meta in this category.

May 2024

May 2025

Rank

Bot Name

Share (May 2024)

Rank

Bot Name

Share (May 2025)

1

Bytespider

42%

1

GPTBot

30%

2

ClaudeBot

27%

2

ClaudeBot

21%

3

Amazonbot

21%

3

Meta-ExternalAgent

19%

4

GPTBot

5%

4

Amazonbot

11%

5

Applebot

4.1%

5

Bytespider

7.2%

Rank Bot Name Share (May 2024) Rank Bot Name Share (May 2025)
1 Bytespider 42% 1 GPTBot 30%
2 ClaudeBot 27% 2 ClaudeBot 21%
3 Amazonbot 21% 3 Meta-ExternalAgent 19%
4 GPTBot 5% 4 Amazonbot 11%
5 Applebot 4.1% 5 Bytespider 7.2%

For additional context, the list below includes further information about the bots with higher crawling shares seen above. This information comes from the same open-source list mentioned above and from publications by companies like OpenAI, which explain how their crawlers are used. 

  • GPTBot – OpenAI’s crawler used to improve and train large language models like ChatGPT.

  • ClaudeBot – Anthropic’s crawler for training and updating the Claude AI assistant.

  • Meta-ExternalAgent – Meta’s bot likely used for collecting data to train or fine-tune LLMs.

  • Amazonbot – Amazon’s crawler that gathers data for its search and AI applications.

  • Bytespider – ByteDance’s AI data collector, often linked to training models like Ernie or TikTok-related AI.

  • Applebot – Apple’s web crawler primarily for Siri and Spotlight search, possibly used in AI development.

  • OAI-SearchBot – OpenAI’s search-focused crawler, likely used for retrieving real-time web info for models.

  • ChatGPT-User – Represents API-based or browser usage of ChatGPT in connection with user interactions.

  • PerplexityBot – Crawler from Perplexity.ai, which powers their AI answer engine using real-time web data.

Webmasters can inform crawler operators of whether they want these bots and crawlers to access their content by setting out rules in a file called robots.txt, which tells crawlers what pages they should or shouldn’t access. As we’ve seen recently, crawlers honoring your robots.txt policies is voluntary, but Cloudflare announced tools like AI Audit to help content creators to enforce it.

Now, as we’ve seen, the landscape of web crawling is evolving rapidly, driven by the merging roles of search engines and AI. AI is now deeply integrated into search, seen in Google’s AI Overviews and AI Mode, but also in social media platforms, like Meta AI on Instagram. So, let's broaden our analysis to include these wider AI-driven crawling activities.

General AI and search crawling growth: +18%

A broader view reveals the growth of crawling traffic from both search and AI crawlers over the first few months of 2025. To remove customer growth bias, we'll analyze trends using a fixed set of customers from specific weeks (a method we’ve used in our Cloudflare Radar Year in Review): the first week of May 2024, a week in November 2024, and the first week of April 2025. 

Using that method, we found that AI and search crawler traffic grew by 18% from May 2024 to May 2025 (comparing full-month periods). The increase was even higher, at 48%, when including new Cloudflare customers added during that time. Peak AI and search crawling traffic occurred in April 2025, with a 32% increase compared to May 2024. This confirms that crawling traffic has clearly risen over the past year, but also that growth is not always constant. Google remains the dominant player, and its share is growing too, as we’ll see in the next section.

As the next chart shows, crawling traffic increased sharply in March and April 2025 and remained high, though slightly lower, in May.

The patterns on the above crawling chart also seem to reflect broader seasonal patterns and general human Internet traffic patterns. In 2024, traffic dropped during the summer in the Northern Hemisphere, with August and September being the least active months. And like overall Internet traffic, it then rose in November, when people are typically more online due to shopping and seasonal habits, as we've seen in past analyses

Googlebot crawling grew 96% in one year

Googlebot, which indexes content for Google Search, was clearly the top crawler throughout the period and showed strong growth, up 96% from May 2024 to May 2025, reflecting increased crawling by Google. Crawling traffic peaked in April 2025, reaching 145% higher than in May 2024. It's also important to mention that Google made changes to its search and launched AI Overviews in its search engine during this time — first in the US in May 2024, then in more countries later.

Two trends stand out when looking at daily data for Google-related crawlers, as shown in the graph below. First, Googlebot and the more recent GoogleOther (a web crawler from 2023 for “research and development”) account for most of Google’s crawling activity. Second, there were two visible drops in crawling traffic: one on December 14, 2024 (around a Google Search update), and another from May 20 to May 28, 2025. That May 20 drop occurred around the same time as the rollout of AI Mode on Google Search in the US, although the timing may be coincidental.

Breakdown of top 20 AI and search web crawlers 

Ranking crawlers by their share of total requests gives a clearer picture of which bots are gaining or losing ground, especially among those focused on search and AI. The table below shows a clear trend: some AI bots have grown rapidly since last year (with growth beginning even earlier), while many traditional search crawlers have remained flat or lost share (as in the case of Bing and its Bingbot crawler). The main exception is Googlebot.

The next table shows the percentage share of each crawler out of all crawling traffic generated by this specific cohort of over 30 AI & search crawlers observed by Cloudflare in May 2024 and May 2025. The table below also includes the change in percentage points and the growth or decline in raw request volume. Crawlers are ranked by their share in May 2025. Key crawler shifts include GPTBot rising sharply (+305%), while Bytespider dropped dramatically (-85%).

Rank

Bot name

Share May 2024

Share May 2025

Δ percentage-point change

Raw requests growth (May 2024 to May 2025)

1

Googlebot

30%

50%

+20 pp

96%

2

Bingbot

10%

8.7%

-1.3 pp

2%

3

GPTBot

2.2%

7.7%

+5.5 pp

305%

4

ClaudeBot

11.7%

5.4%

-6.3 pp

-46%

5

GoogleOther

4.4%

4.3%

-0.1 pp

14%

6

Amazonbot

7.6%

4.2%

-3.4 pp

-35%

7

Googlebot-Image

4.5%

3.3%

-1.2 pp

-13%

8

Bytespider

22.8%

2.9%

-19.8 pp

-85%

9

Yandex

2.8%

2.2%

-0.7 pp

-10%

10

ChatGPT-User

0.1%

1.3%

+1.2 pp

2,825%

11

Applebot

1.9%

1.2%

-0.7 pp

-26%

12

Timpibot

0.3%

0.6%

+0.3 pp

133%

13

Baiduspider

0.5%

0.4%

-0.1 pp

7%

14

PerplexityBot

<0.01%

0.2%

+0.2 pp

157,490%

15

DuckDuckBot

0.2%

0.1%

-0.1 pp

-16%

16

SeznamBot

0.1%

0.1%

2%

17

Yeti

0.1%

0.1%

47%

18

coccocbot

0.1%

0.1%

-3%

19

Sogou

0.1%

0.1%

-22%

20

Yahoo! Slurp

0.1%

0.0%

-0.1 pp

-8%

Rank Bot name Share May 2024 Share May 2025 Δ percentage-point change Raw requests growth (May 2024 to May 2025)
1 Googlebot 30% 50% +20 pp 96%
2 Bingbot 10% 8.7% -1.3 pp 2%
3 GPTBot 2.2% 7.7% +5.5 pp 305%
4 ClaudeBot 11.7% 5.4% -6.3 pp -46%
5 GoogleOther 4.4% 4.3% -0.1 pp 14%
6 Amazonbot 7.6% 4.2% -3.4 pp -35%
7 Googlebot-Image 4.5% 3.3% -1.2 pp -13%
8 Bytespider 22.8% 2.9% -19.8 pp -85%
9 Yandex 2.8% 2.2% -0.7 pp -10%
10 ChatGPT-User 0.1% 1.3% +1.2 pp 2,825%
11 Applebot 1.9% 1.2% -0.7 pp -26%
12 Timpibot 0.3% 0.6% +0.3 pp 133%
13 Baiduspider 0.5% 0.4% -0.1 pp 7%
14 PerplexityBot <0.01% 0.2% +0.2 pp 157,490%
15 DuckDuckBot 0.2% 0.1% -0.1 pp -16%
16 SeznamBot 0.1% 0.1% 2%
17 Yeti 0.1% 0.1% 47%
18 coccocbot 0.1% 0.1% -3%
19 Sogou 0.1% 0.1% -22%
20 Yahoo! Slurp 0.1% 0.0% -0.1 pp -8%

Based on this data, two major shifts in web crawling occurred between May 2024 and May 2025:

1. Some AI crawlers rose sharply. GPTBot (from OpenAI) increased its share from 2.2% to 7.7% (+5.5 pp), with a 305% rise in requests. This underscores the data demand for training large language models like ChatGPT. GPTBot jumped from #9 in May 2024 to #3 in May 2025.

Another OpenAI crawler, ChatGPT-User, saw requests surge by 2,825%, reaching a 1.3% share. This reflects a large rise in ChatGPT user activity or API-based interactions that involve accessing web content. PerplexityBot (from Perplexity.ai), despite a small 0.2% share, recorded the highest growth rate: a staggering 157,490% increase in raw requests.

Meanwhile, some AI crawlers saw steep declines. ClaudeBot (Anthropic) fell from 11.7% to 5.4% of total traffic and dropped 46% in requests. Bytespider plummeted 85% in request volume, falling from #2 to #8 in crawler share (now at just 2.9%).

Both Amazonbot and Applebot, also considered AI crawlers, saw decreases in share and in raw requests (–35% and –26%, respectively).

2. Google’s dominance expanded. Googlebot’s share rose from 30% to 50%, supporting search indexing, but potentially also having AI-related purposes (such as new AI Overviews in Google Search). And GoogleOther (the crawler introduced in 2023) also increased in crawling traffic, 14%. Other Google crawlers not in the top 20, like Googlebot-News, also grew significantly (+71% in requests). There’s a clear trend of growth in these Google-related web crawlers at a time when the company is investing heavily in combining AI with search.

Also in the search category, Bingbot’s share (from Microsoft) declined slightly from 10% to 8.7% (-1.3 pp), though its raw requests still grew modestly by 2%.

These trends show that web crawling is increasingly dominated by bots from Google and OpenAI, reflecting clear shifts over the course of a year. Google also appears to be adapting how it collects data to support both traditional search and AI-driven features.

Also worth noting is FriendlyCrawler, which no longer appears in the top 20 list as of May 2025 (now ranked #35). It was #14 in May 2024 with a 0.2% share, but saw a 100% drop in requests by May 2025. This bot is known to index and analyze website content, although its owner and purpose remain unclear. Typically, crawlers like this are used for improving search results, market research, or analytics.

robots.txt & AI bots: GPTBot leads twice

Recent data from June 6, 2025, from Cloudflare Radar shows that out of 3,816 domains (from the top 10,000) where we were able to find a robots.txt file, 546 (about 14%) had “allow” or “disallow” (fully or partially) directives targeting AI bots in particular.

This leaves many site owners in a gray area because it’s not always clear how effective robots.txt is in managing AI crawlers. Some site owners may not think to use it specifically for AI bots, while others might be unsure whether these bots even respect robots.txt rules, especially newer or less transparent crawlers. In other cases, sites use partial rules to fine-tune access, trying to balance visibility and protection without fully opting in or out.

The “disallow” rules appear far more often than “allow” rules. The most frequently blocked bot was GPTBot, disallowed by 312 domains (250 fully, 62 partially), followed by CCBot and Google-Extended, as shown in the following graph.

Although GPTBot was the most blocked, it was also the most explicitly allowed, with 61 domains granting access (18 fully, 43 partially). Still, very few sites openly and explicitly allow AI bots, and when they do, it’s usually for limited sections. Note that bots not listed in a site’s robots.txt are effectively allowed by default.

As AI crawling increases, more websites are moving from passive signals like robots.txt to active protections like Web Application Firewalls. The ecosystem is shifting, with a growing focus on enforceable controls.

Note: When we analyze crawler traffic, we compare user-agent tokens found in robots.txt files (like those for AI crawlers) with the actual user-agent strings in HTTP requests. It's important to note that some robots.txt tokens, such as Google-Extended, aren't user-agent substrings. As described in RFC 9309, one goal of these token may be to signal the purpose of the crawler. For instance, Google uses Google-Extended in robots.txt to see if your content can be used for AI training, but the traffic itself still comes from standard Google user-agents like Googlebot. Because of this, not every robots.txt entry will have a direct match in HTTP request logs.

Conclusion

As AI crawlers reshape the Internet, websites face both new challenges and new opportunities in managing their online presence.

This analysis highlights the growing impact of AI on web crawling, showing a clear shift from traditional search indexing to data collection for training AI models. The detailed statistics, such as Googlebot’s continued growth and the rapid rise of AI-specific crawlers, offer context for understanding how this space is evolving and what it means for the future of web content access.

The trend toward stronger, enforceable blocking methods, something Cloudflare has also been invested, signals a key shift in how websites may control their interactions with AI systems going forward.

Protegemos redes corporativas completas, ayudamos a los clientes a desarrollar aplicaciones web de forma eficiente, aceleramos cualquier sitio o aplicación web, prevenimos contra los ataques DDoS, mantenemos a raya a los hackers, y podemos ayudarte en tu recorrido hacia la seguridad Zero Trust.

Visita 1.1.1.1 desde cualquier dispositivo para empezar a usar nuestra aplicación gratuita y beneficiarte de una navegación más rápida y segura.

Para saber más sobre nuestra misión para ayudar a mejorar Internet, empieza aquí. Si estás buscando un nuevo rumbo profesional, consulta nuestras ofertas de empleo.
AIRadarAI Bots (ES)Bots

Síguenos en X

João Tomé|@emot
Cloudflare|@cloudflare

Publicaciones relacionadas