Why we're rethinking cache for the AI era

Avani Wildani — Thu, 02 Apr 2026 13:00:00 GMT

Cloudflare data shows that 32% of traffic across our network originates from automated traffic. This includes search engine crawlers, uptime checkers, ad networks — and more recently, AI assistants looking to the web to add relevant data to their knowledge bases as they generate responses with retrieval-augmented generation (RAG). Unlike typical human behavior, AI agents, crawlers, and scrapers’ automated behavior may appear aggressive to the server responding to the requests.

For instance, AI bots frequently issue high-volume requests, often in parallel. Rather than focusing on popular pages, they may access rarely visited or loosely related content across a site, often in sequential, complete scans of the websites. For example, an AI assistant generating a response may fetch images, documentation, and knowledge articles across dozens of unrelated sources.

Although Cloudflare already makes it easy to control and limit automated access to your content, many sites may want to serve AI traffic. For instance, an application developer may want to guarantee that their developer documentation is up-to-date in foundational AI models, an e-commerce site may want to ensure that product descriptions are part of LLM search results, or publishers may want to get paid for their content through mechanisms such as pay per crawl.

Website operators therefore face a dichotomy: tune for AI crawlers, or for human traffic. Given both exhibit widely different traffic patterns, current cache architectures force operators to choose one approach to save resources.

In this post, we’ll explore how AI traffic impacts storage cache, describe some challenges associated with mitigating this impact, and propose directions for the community to consider adapting CDN cache to the AI era.

This work is a collaborative effort with a team of researchers at ETH Zurich. The full version of this work was published at the 2025 Symposium on Cloud Computing as “Rethinking Web Cache Design for the AI Era” by Zhang et al.

Caching

Let's start with a quick refresher on caching. When a user initiates a request for content on their device, it’s usually sent to the Cloudflare data center closest to them. When the request arrives, we check to see if we have a valid cached copy. If we do, we can serve the content immediately, resulting in a fast response, and a happy user. If the content isn't available to read from our cache, (a "cache miss"), our data centers reach out to the origin server to get a fresh copy, which then stays in our cache until it expires or other data pushes it out.

Keeping the right elements in our cache is critical for reducing our cache misses and providing a great user experience — but what’s “right” for human traffic may be very different from what’s right for AI crawlers!

AI traffic at Cloudflare

Here, we’ll focus on AI crawler traffic, which has emerged as the most active AI bot type in recent analyses, accounting for 80% of the self-identified AI bot traffic we see. AI crawlers fetch content to support real-time AI services, such as answering questions or summarizing pages, as well as to harvest data to build large training datasets for models like LLMs.

From Cloudflare Radar, we see that the vast majority of single-purpose AI bot traffic is for training, with search as a distant second. (See this blog post for a deep discussion of the AI crawler traffic we see at Cloudflare).

While both search and training crawls impact cache through numerous sequential, long-tail accesses, training traffic has properties such as high unique URL ratio, content diversity, and crawling inefficiency that make it even more impactful on cache.

How does AI traffic differ from other traffic for a CDN?

AI crawler traffic has three main differentiating characteristics: high unique URL ratio, content diversity, and crawling inefficiency.

Public crawl statistics from Common Crawl, which performs large-scale web crawls on a monthly basis, show that over 90% of pages are unique by content. Different AI crawlers also target distinct content types: e.g., some specialize in technical documentation, while others focus on source code, media, or blog posts. Finally, AI crawlers do not necessarily follow optimal crawling paths. A substantial fraction of fetches from popular AI crawlers result in 404 errors or redirects, often due to poor URL handling. The rate of these ineffective requests varies depending on how well the crawler is tuned to target live, meaningful content. AI crawlers also typically do not employ browser-side caching or session management in the same way human users do. AI crawlers can launch multiple independent instances, and because they don’t share sessions, each may appear as a new visitor to the CDN, even if all instances request the same content.

Even a single AI crawler is likely to dig deeper into websites and explore a broader range of content than a typical human user. Usage data from Wikipedia shows that pages once considered "long-tail" or rarely accessed are now being frequently requested, shifting the distribution of content popularity within a CDN's cache. In fact, AI agents may iteratively loop to refine search results, scraping the same content repeatedly. We model this to show that this iterative looping leads to low content reuse and broad coverage.

Our modeling of AI agent behavior shows that as they iteratively loop to refine search results (a common pattern for retrieval-augmented generation), they maintain a consistently high unique access ratio (the red columns above) — typically between 70% and 100%. This means that each loop, while generally increasing accuracy for the agent (represented here by the blue line), is constantly fetching new, unique content rather than revisiting previously seen pages.

This repeat access to long-tail assets churns the cache that the human traffic relies on. That could make existing pre-fetching and traditional cache invalidation strategies less effective as the amount of crawler traffic increases.

How does AI traffic impact cache?

For a CDN, a cache miss means having to go to the origin server to fetch the requested content. Think of a cache miss like your local library not having a book in house, so you have to wait to get the book from inter-library loan. You’ll get your book eventually, but it will take longer than you wanted. It will also inform your library that having that book in stock locally could be a good idea.

As a result of their broad, unpredictable access patterns with long-tail reuse, AI crawlers significantly raise the cache miss rate. And many of our typical methods to improve our cache hit rate, such as cache speculation or prefetching, are significantly less effective.

The first chart below shows the difference in cache hit rates for a single node in Cloudflare’s CDN with and without our identified AI crawlers. While the impact of crawlers is still relatively limited, there is a clear drop in hit rate with the addition of AI crawler traffic. We manage our cache with an algorithm called “least recently used”, or LRU. This means that the least-requested content can be evicted from cache first to make space for more popular content when storage space is full. The drop in hit rate implies that LRU is struggling under the repeated scan behavior of AI crawlers.

The bottom figure shows Al cache misses during this time. Each of those cache misses represents a request to the origin, slowing response times as well as increasing egress costs and load on the origin.

This surge in AI bot traffic has had real-world impact. The following table from our paper shows the effects on several large websites. Each example links to its source report.

System	Reported AI Traffic Behavior	Reported Impact	Reported Mitigations
Wikipedia	Bulk image scraping for model training¹	50% surge in multimedia bandwidth usage¹	Blocked crawler traffic¹
SourceHut	LLM crawlers scraping code repositories²^,³	Service instability and slowdowns²^,³	Blocked crawler traffic²^,³
Read the Docs	AI crawlers download large files hundreds of times daily²^,⁴	Significant bandwidth increase²^,⁴	Temporarily blocked crawler traffic, performed IP-based rate limiting, reconfigured CDN to improve caching²^,⁴
Fedora	AI scrapers recursively crawl package mirrors²^,⁵^,⁶	Slow response for human users²^,⁵^,⁶	Geo-blocked traffic from known bot sources along with blocking several subnets and even countries²^,⁵^,⁶
Diaspora	Aggressive scraping without respecting robots.txt⁷	Slow response and downtime for human users⁷	Blocked crawler traffic and added rate limits⁷

The impact is severe: Wikimedia experienced a 50% surge in multimedia bandwidth usage due to bulk image scraping. Fedora, which hosts large software packages, and the Diaspora social network suffered from heavy load and poor performance for human users. Many others have noted bandwidth increases or slowdowns from AI bots repeatedly downloading large files. While blocking crawler traffic mitigates some of the impact, a smarter cache architecture would let site operators serve AI crawlers while maintaining response times for their human users.

AI-aware caching

AI crawlers power live applications such as retrieval-augmented generation (RAG) or real-time summarization, so latency matters. That’s why these requests should be routed to caches that can balance larger capacity with moderate response times. These caches should still preserve freshness, but can tolerate slightly higher access latency than human-facing caches.

AI crawlers are also used for building training sets and running large-scale content collection jobs. These workloads can tolerate significantly higher latency and are not time-sensitive. As such, their requests can be served from deep cache tiers that take longer to reach (e.g., origin-side SSD caches), or even delayed using queue-based admission or rate-limiters to prevent backend overload. This also opens the opportunity to defer bulk scraping when infrastructure is under load, without affecting interactive human or AI use cases.

Existing projects like Cloudflare’s AI Index and Markdown for Agents allow website operators to present a simplified or reduced version of websites to known AI agents and bots. We're making plans to do much more to mitigate the impact of AI traffic on CDN cache, leading to better cache performance for everyone. With our collaborators at ETH Zurich, we’re experimenting with two complementary approaches: first, traffic filtering with AI-aware caching algorithms; and second, exploring the addition of an entirely new cache layer to siphon AI crawler traffic to a cache that will improve performance for both AI crawlers and human traffic.

There are several different types of cache replacement algorithms, such as LRU (“Least Recently Used”), LFU (“Least Frequently Used”), or FIFO (“First-In, First-Out”), that govern how a storage cache chooses to evict elements from the cache when a new element needs to be added and the cache is full. LRU is often the best balance of simplicity, low-overhead, and effectiveness for generic situations, and is widely used. For mixed human and AI bot traffic, however, our initial experiments indicate that a different choice of cache replacement algorithm, particularly using SEIVE or S3FIFO, could allow human traffic to achieve the same hit rate with or without AI interference. We are also experimenting with developing more directly workload-aware, machine learning-based caching algorithms to customize cache response in real time for a faster and cheaper cache.

Long term, we expect that a separate cache layer for AI traffic will be the best way forward. Imagine a cache architecture that routes human and AI traffic to distinct tiers deployed at different layers of the network. Human traffic would continue to be served from edge caches located at CDN PoPs, which prioritize responsiveness and cache hit rates. For AI traffic, cache handling could vary by task type.

This is just the beginning

The impact of AI bot traffic on cloud infrastructure is only going to grow over the next few years. We need better characterization of the effects on CDNs across the globe, along with bold new cache policies and architectures to address this novel workload and help make a better Internet.

Cloudflare is already solving the problems we’ve laid out here. Cloudflare reduces bandwidth costs for customers who experience high bot traffic with our AI-aware caching, and with our AI Crawl Control and Pay Per Crawl tools, we give customers better control over who programmatically accesses their content.

We’re just getting started exploring this space. If you're interested in building new ML-based caching algorithms or designing these new cache architectures, please apply for an internship! We have open internship positions in Summer and Fall 2026 to work on this and other exciting problems at the intersection of AI and Systems.