April 23, 2013

What CloudFlare Logs

3 minute read

Over the last few weeks, we've had a number of requests for information about what data CloudFlare logs when someone visits a site on our network. While we have provided a Privacy Policy that outlines how we keep information private, I wanted to take the time to clarify our customer log retention policies.

What CloudFlare Logs

When you visit a site on CloudFlare's network, we record information about that visit. If you run a web server you'll be familiar with these logs as they're similar to an Apache access log. We log data for two reasons: 1) to help us identify security threats and attacks hitting our customers in order to mitigate them; and 2) in order to identify performance bottlenecks and errors on our system.

It's somewhat hard to fathom the scale of the log data that we generate. Every minute of every day we generate more than 20GB (compressed) of log data. That translates, at our current volume, to more than 10 Petabytes of storage needed to store a year's worth of logs, and, due to our continued growth, that volume that has been doubling every 4 months or so. Today, even if we wanted to, we don't have the ability to retain all the logs we generate. This means that, for most customers, we discard access logs within 4 hours of them being recorded.

For our Enterprise customers, we offer an optional feature that allows them to export their raw log files in Apache format. This requires us to store log files for a longer period of time in order to allow them to be downloaded. By default, we store logs for these customers for 3 days.

Crunching Data

Since CloudFlare does not keep the raw logs, it is impossible for us to answer questions like: tell me all the visitors who have been to a particular website on CloudFlare's network.

However, CloudFlare does generate aggregate data, so we can provide analytics back to customers. We use the aggregated data to populate things like the CloudFlare Analytics page which includes numbers of hits, page views, bandwidth consumed and unique visitors. As logs are received, we run a stream processing engine that extracts this summary data. This data is correlated in each of our edge data centers and then sent to one of our core facilities in order to report through our UI.

This same data summary engine also looks for attack patterns, which is then used to provide security protection for our customer's websites. Using this engine, we can identify an attack on one site, usually in less than 1 minute, and then push updated security rules that then protect every site using CloudFlare from that same attack.

Access logs for most customers are stored briefly at the edge of our network and then deleted within 4 hours. If there is an error, those logs are transmitted back to one of our core facilities in order for us to diagnose the error. Error logs sent to core are currently kept for 1 week then discarded.

The Future

Going forward, we want to allow customers who would like to have more insight into the visitors to their sites to be able to choose to do so. As we do, we will provide details on how any feature we add changes our log retention policy, and we will continue to be guided by the principle that our customers should be able to understand and control what data is being stored about visitors to their sites.

What CloudFlare Logs

Crunching Data

The Future

Related tags

Subscribe to receive notifications of new posts