Millions of customers trust Cloudflare to accelerate their website, protect their network, or as a platform to build their own applications. But, once you’re running in production, how do you know what’s going on with your application? You need logs from Cloudflare – a record of what happened on our network when your customers interacted with your product that uses Cloudflare.
Cloudflare Logs are an indispensable tool for debugging applications, identifying security vulnerabilities, or just understanding how users are interacting with your product. However, our customers generate petabytes of logs, and store them for months or years at a time. Log data is tantalizing: all those answers, just waiting to be revealed with the right query! But until now, it’s been too hard for customers to actually store, search, and understand their logs without expensive and cumbersome third party tools.
Today we’re announcing Cloudflare Logs Engine: a new product to enable any kind of investigation with Cloudflare Logs — all within Cloudflare.
Starting today, Cloudflare customers who push their logs to R2 can retrieve them by time range and unique identifier. Over the coming months we want to enable customers to:
- Store logs for any Cloudflare dataset, for as long as you want, with a few clicks
- Access logs no matter what plan you use, without relying on third party tools
- Write queries that include multiple datasets
- Quickly identify the logs you need and take action based on what you find
Why Cloudflare Logs?
When it comes to visibility into your traffic, most customers start with analytics. Cloudflare dashboard is full of analytics about all of our products, which give a high-level overview of what’s happening: for example, number of requests served, the ratio of cache hits, or the amount of CPU time used.
But sometimes, more detail is needed. Developers especially need to be able to read individual log lines to debug applications. For example, suppose you notice a problem where your application throws an error in an unexpected way – you need to know the cause of that error and see every request with that pattern.
Cloudflare offers tools like Instant Logs and wrangler tail which excel at real-time debugging. These are incredibly helpful if you’re making changes on the fly, or if the problem occurs frequently enough that it will appear during your debugging session.
In other cases, you need to find that needle in a haystack — the one rare event that causes everything to go wrong. Or you might have identified a security issue and want to make sure you’ve identified every time that issue could have been exploited in your application’s history.
When this happens, you need logs. In particular, you need forensics: the ability to search the entire history of your logs.
A brief overview of log analysis
Before we take a look at Logs Engine itself, I want to briefly talk about alternatives – how have our customers been dealing with their logs so far?
Cloudflare has long offered Logpull and Logpush. Logpull enables enterprise customers to store their HTTP logs on Cloudflare for up to seven days, and retrieve them by either time or RayID. Logpush can send your Cloudflare logs just about anywhere on the Internet, quickly and reliably. While Logpush provides more flexibility, it’s been up to customers to actually store and analyze those logs.
Cloudflare has a number of partnerships with SIEMs and data warehouses/data lakes. Many of these tools even have pre-built Cloudflare dashboards for easy visibility. And third party tools have a big advantage in that you can store and search across many log sources, not just Cloudflare.
That said, we’ve heard from customers that they have some challenges with these solutions.
First, third party log tooling can be expensive! Most tools require that you pay not just for storage, but for indexing all of that data when it’s ingested. While that enables powerful search functionality later on, Cloudflare (by its nature) is often one of the largest emitters of logs that a developer will have. If you were to store and index every log line we generate, it can cost more money to analyze the logs than to deliver the actual service.
Second, these tools can be hard to use. Logs are often used to track down an issue that customers discover via analytics in the Cloudflare dashboard. After finding what you need in logs, it can be hard to get back to the right part of the Cloudflare dashboard to make the appropriate configuration changes.
Finally, Logpush was previously limited to Enterprise plans. Soon, we will start offering these services to customers at any scale, regardless of plan type or how they choose to pay.
Why Logs Engine?
With Logs Engine, we wanted to solve these problems. We wanted to build something affordable, easy to use, and accessible to any Cloudflare customer. And we wanted it to work for any Cloudflare logs dataset, for any span of time.
Our first insight was that to make logs affordable, we need to separate storage and compute. The cost of Storage is actually quite low! Thanks to R2, there’s no reason many of our customers can’t store all of their logs for long periods of time. At the same time, we want to separate out the analysis of logs so that customers only pay for the compute of logs they analyze – not every line ingested. While we’re still developing our query pricing, our aim is to be predictable, transparent and upfront. You should never be surprised by the cost of a query (or land a huge bill by accident).
It’s great to separate storage and compute. But, if you need to scan all of your logs anyway to answer the first question you have, you haven’t gained any benefits to this separation. In order to realize cost savings, it’s critical to narrow down your search before executing a query. That’s where our next big idea came in: a tight integration with analytics.
Most of the time, when analyzing logs, you don’t know what you’re looking for. For example, if you’re trying to find the cause of a specific origin status code, you may need to spend some time understanding which origins are impacted, which clients are sending them, and the time range in which these errors happened. Thanks to our ABR analytics, we can provide a good summary of the data very quickly – but not the exact details of what happened. By integrating with analytics, we can help customers narrow down their queries, then switch to Logs Engine once you know exactly what you’re looking for.
Finally, we wanted to make logs accessible to anyone. That means all plan types – not just Enterprise.
Additionally, we want to make it easy to both set up log storage and analysis, and also to take action on logs once you find problems. With Logs Engine, it will be possible to search logs right from the dashboard, and to immediately create rules based on the patterns you find there.
What’s available today and our roadmap
Today, Enterprise customers can store logs in R2 and retrieve them via time range. Currently in beta, we also allow customers to retrieve logs by RayID (see our companion blog post) — to join the beta, please email [email protected].
Coming soon, we will enable customers on all plan types — not just Enterprise — to ingest logs into Logs Engine. Details on pricing will follow soon.
We also plan to build more powerful querying capability, beyond time range and RayID lookup. For example, we plan to support arbitrary filtering on any column, plus more expressive queries that can look across datasets or aggregate data.
But why stop at logs? This foundation lays the groundwork to support other types of data sources and queries one day. We are just getting started. Over the long term, we’re also exploring the ability to ingest data sources outside of Cloudflare and query them. Paired with Analytics Engine this is a formidable way to explore any data set in a cost-effective way!