Last Thursday we released details on a bug in Cloudflare's parser impacting our customers. It was an extremely serious bug that caused data flowing through Cloudflare's network to be leaked onto the Internet. We fully patched the bug within hours of being notified. However, given the scale of Cloudflare, the impact was potentially massive.
The bug has been dubbed “Cloudbleed.” Because of its potential impact, the bug has been written about extensively and generated a lot of uncertainty. The burden of that uncertainty has been felt by our partners, customers, and our customers’ customers. The question we’ve been asked the most often is: what risk does Cloudbleed pose to me?
We've spent the last twelve days using log data on the actual requests we’ve seen across our network to get a better grip on what the impact was and, in turn, provide an estimate of the risk to our customers. This post outlines our initial findings.
The summary is that, while the bug was very bad and had the potential to be much worse, based on our analysis so far: 1) we have found no evidence based on our logs that the bug was maliciously exploited before it was patched; 2) the vast majority of Cloudflare customers had no data leaked; 3) after a review of tens of thousands of pages of leaked data from search engine caches, we have found a large number of instances of leaked internal Cloudflare headers and customer cookies, but we have not found any instances of passwords, credit card numbers, or health records; and 4) our review is ongoing.
To make sense of the analysis, it's important to understand exactly how the bug was triggered and when data was exposed. If you feel like you've already got a good handle on how the bug got triggered, click here to skip to the analysis.
Triggering the Bug
One of Cloudflare's core applications is a stream parser. The parser scans content as it is delivered from Cloudflare's network and is able to modify it in real time. The parser is used to enable functions like automatically rewriting links from HTTP to HTTPS (Automatic HTTPS Rewrites), hiding email addresses on pages from email harvesters (Email Address Obfuscation), and other similar features.
The Cloudbleed bug was triggered when a page with two characteristics was requested through Cloudflare's network. The two characteristics were: 1) the HTML on the page needed to be broken in a specific way; and 2) a particular set of Cloudflare features needed to be turned on for the page in question.
The specific HTML flaw was that the page had to end with an unterminated attribute. In other words, something like:
<IMG HEIGHT="50px" WIDTH="200px" SRC="
Here's why that mattered. When a page for a particular customer is being parsed it is stored in memory on one of the servers that is a part of our infrastructure. Contents of the other customers' requests are also in adjacent portions of memory on Cloudflare's servers.
The bug caused the parser, when it encountered unterminated attribute at the end of a page, to not stop when it reached the end of the portion of memory for the particular page being parsed. Instead, the parser continued to read from adjacent memory, which contained data from other customers' requests. The contents of that adjacent memory was then dumped onto the page with the flawed HTML.
The screenshot above is an example of how data was dumped on pages. Most of the data was random binary data which the browser is trying to interpret as largely Asian characters. That is followed by a number of internal Cloudflare headers.
If you had accessed one of the pages that triggered the bug you would have seen what likely looked like random text at the end of the page. The amount of data dumped was of random lengths limited to the size of the heap or when the parser happened across a character that caused the output to terminate.
Code Path and a New Parser
In addition to a page with flawed HTML, the particular set of Cloudflare features that were enabled mattered because it determined the version of the parser that was used. We rolled out a new version of the parser code on 22 September 2016. This new version of the parser exposed the bug.
Initially, the new parser code would only get executed under a very limited set of circumstances. Fewer than 180 sites from 22 September 2016 through 13 February 2017 had the combination of the HTML flaw and the set of features that would trigger the new version of the parser. During that time period, pages that had both characteristics and therefore would trigger the bug were accessed an estimated 605,037 times.
On 13 February 2017, not aware of the bug, we expanded the circumstances under which the new parser would get executed. That expanded the number of sites where the bug could get triggered from fewer than 180 to 6,457. From 13 February 2017 through 18 February 2017, when we patched the bug, the pages that would trigger the bug were accessed an estimated 637,034 times. In total, between 22 September 2016 and 18 February 2017 we now estimate based on our logs the bug was triggered 1,242,071 times.
The pages that typically triggered the bug tended to be on small and infrequently accessed sites. When one of these vulnerable pages was accessed and the bug was triggered, it was random what other customers would have content in memory adjacent that would then get leaked. Higher traffic Cloudflare customers would be more probable to have some data in memory because they received more requests and so, probabilistically, they're more likely to have their content in memory at any given time.
To be clear, customers that had data leak did not need to have flawed HTML or any particular Cloudflare features enabled. They just needed to be unlucky and have their data in memory immediately following a page that triggered the bug.
How a Malicious Actor Would Exploit the Bug
The Cloudbleed bug wasn't like a typical data breach. To analogize to the physical world, a typical data breach would be like a robber breaking into your office and stealing all your file cabinets. The bad news in that case is that the robber has all your files. The good news is you know exactly what they have.
Cloudbleed is different. It's more akin to learning that a stranger may have listened in on two employees at your company talking over lunch. The good news is the amount of information for any conversation that's eavesdropped is limited. The bad news is you can't know exactly what the stranger may have heard, including potentially sensitive information about your company.
If a stranger were listening in on a conversation between two employees, the vast majority of what they would hear wouldn't be harmful. But, every once in awhile, the stranger may overhear something confidential. The same is true if a malicious attacker knew about the bug and were trying to exploit it. Given that the data that leaked was random on a per request basis, most requests would return nothing interesting. But, every once in awhile, the data that leaked may return something of interest to a hacker.
If a hacker were aware of the bug before it was patched and trying to exploit it then the best way for them to do so would be to send as many requests as possible to a page that contained the set of conditions that would trigger the bug. They could then record the results. Most of what they would get would be useless, but some would contain very sensitive information.
The nightmare scenario we have been worried about is if a hacker had been aware of the bug and had been quietly mining data before we were notified by Google's Project Zero team and were able to patch it. For the last twelve days we've been reviewing our logs to see if there's any evidence to indicate that a hacker was exploiting the bug before it was patched. We’ve found nothing so far to indicate that was the case.
Identifying Patterns of Malicious Behavior
For a limited period of time we keep a debugging log of requests that pass through Cloudflare. This is done by sampling 1% of requests and storing information about the request and response. We are then able to look back in time for anomalies in HTTP response codes, response or request body sizes, response times, or other unusual behavior from specific networks or IP addresses.
We have the logs of 1% of all requests going through Cloudflare from 8 February 2017 up to 18 February 2017 (when the vulnerability was patched) giving us the ability to look for requests leaking data during this time period. Requests prior to 8 February 2017 had already been deleted. Because we have a representative sample of the logs for the 6,457 vulnerable sites, we were able to parse them in order to look for any evidence someone was exploiting the bug.
The first thing we looked for was a site we knew was vulnerable and for which we had accurate data. In the early hours of 18 February 2017, immediately after the problem was reported to us, we set up a vulnerable page on a test site and used it to reproduce the bug and then verify it had been fixed.
Because we had logging on the test web server itself we were able to quickly verify that we had the right data. The test web server had received 31,874 hits on the vulnerable page due to our testing. We had captured very close to 1% of those requests (316 were stored). From the sampled data, we were also able to look at the sizes of responses which showed a clear bimodal distribution. Small responses were from when the bug was fixed, large responses from when the leak was apparent.
This gave us confidence that we had captured the right information to go hunting for exploitation of the vulnerability.
We wanted to answer two questions:
Did any individual IP hit a vulnerable page enough times that a meaningful amount of data was extracted? This would capture the situation where someone had discovered the problem on a web page and had set up a process to repeatedly download the page from their machine. For example, something as simple as running
curl
in a loop would show up in this analysis.Was any vulnerable page accessed enough times that a meaningful amount of data could have been extracted by a botnet? A more advanced hacker would have wanted to cover their footprints by using a wide range of IP addresses rather than repeatedly visiting a page from a single IP. To identify that possibility we wanted to see if any individual page had been accessed enough times and returned enough data for us to suspect that data was being extracted.
Reviewing the Logs
To answer #1, we looked for any IP addresses that had hit a single page on a vulnerable site more than 1,000 times and downloaded more data than the site would normally deliver. We found 7 IP addresses with those characteristics.
Six of the seven IP addresses were accessing three sites with three pages with very large HTML. Manual inspection showed that these pages did not contain the broken HTML that would have triggered the bug. They also did not appear in a database of potentially vulnerable pages that our team gathered after the bug was patched.
The other IP address belonged to a mobile network and was traffic for a ticket booking application. The particular page was very large even though it was not leaking data, however, it did not contain broken HTML, and was not in our database of vulnerable pages.
To look for evidence of #2, we retrieved every page on a vulnerable site that was requested more than 1,000 times during the period. We then downloaded those pages and ran them through the vulnerable version of our software in a test environment to see if any of them would cause a leak. This search turned up the sites we had created to test the vulnerability. However, we found no vulnerable pages, outside of our own test sites, that had been accessed more than 1,000 times.
This leads us to believe that the vulnerability had not been exploited between 8 February 2017 and 18 February 2017. However, we also wanted to look for signs of exploitation between 22 September 2016 and 8 February 2017 — a time period for which we did not have sampled log data. To do that, we turned to our customer analytics database.
Reviewing Customer Analytics
We store customer analytics data with one hour granularity in a large datastore. For every site on Cloudflare and for each hour we have the total number of requests to the site, number of bytes read from the origin web server, number of bytes sent to client web browsers, and the number of unique IP addresses accessing the site.
If a malicious attacker were sending a large number of requests to exploit the bug then we hypothesized that a number of signals would potentially appear in our logs. These include:
The ratio of requests per unique IP would increase. While an attacker could use a botnet or large number of machines to harvest data, we speculated that, at least initially, upon discovering the bug the hacker would send a large number of requests from a small set of IPs to gather initial data.
* The ratio of bandwidth per request would increase. Since the bug leaks a large amount of data onto the page, if the bug were being exploited then the bandwidth per request would increase.
* The ratio of bandwidth per unique IP would also increase. Since you’d expect that more data was going to the smaller set of IPs the attacker would use to pull down data then the bandwidth per IP would increase.
We used the data from before the bug impacted sites to set the baseline for each site for each of these three ratios. We then tracked the ratios above across each site individually during the period for which it was vulnerable and looked for anomalies that may suggest a hacker was exploiting the vulnerability ahead of its public disclosure.
This data is much more noisy than the sampled log data because it is rolled up and averaged over one hour windows. However, we have not seen any evidence of exploitation of this bug from this data.
Reviewing Crash Data
Lastly, when the bug was triggered it would, depending on what was read from memory, sometimes cause our parser application to crash. We have technical operations logs that record every time an application running on our network crashes. These logs cover the entire period of time the bug was in production (22 September 2016 – 18 February 2017).
We ran a suite of known-vulnerable HTML through our test platform to establish the percentage of time that we would expect the application to crash.
We reviewed our application crash logs for the entire period the bug was in production. We did turn up periodic instances of the parser crashing that align with the frequency of how often we estimate the bug was triggered. However, we did not see a signal in the crash data that would indicate that the bug was being actively exploited at any point during the period it was present in our system.
Purging Search Engine Caches
Even if an attacker wasn’t actively exploiting the bug prior to our patching it, there was still potential harm because private data leaked and was cached by various automated crawlers. Because the 6,457 sites that could trigger the bug were generally small, the largest percentage of their traffic comes from search engine crawlers. Of the 1,242,071 requests that triggered the bug, we estimate more than half came from search engine crawlers.
Cloudflare has spent the last 12 days working with various search engines — including Google, Bing, Yahoo, Baidu, Yandex, DuckDuckGo, and others — to clear their caches. We were able to remove the majority of the cached pages before the disclosure of the bug last Thursday.
Since then, we’ve worked with major search engines as well as other online archives to purge cached data. We’ve successfully removed more than 80,000 unique cached pages. That underestimates the total number because we’ve requested search engines purge and recrawl entire sites in some instances. Cloudflare customers who discover leaked data still online can report it by sending a link to the cache to [email protected] and our team will work to have it purged.
Analysis of What Data Leaked
The search engine caches provide us an opportunity to analyze what data leaked. While many have speculated that any data passing through Cloudflare may have been exposed, the way that data is structured in memory and the frequency of GET versus POST requests makes certain data more or less likely to be exposed. We analyzed a representative sample of the cached pages retrieved from search engine caches and ran a thorough analysis on each of them. The sample included thousands of pages and was statistically significant to a confidence level of 99% with a margin of error of 2.5%. Within that sample we would expect the following data types to appear this many times in any given leak:
67.54 Internal Cloudflare Headers
0.44 Cookies
0.04 Authorization Headers / Tokens
0 Passwords
0 Credit Cards / Bitcoin Addresses
0 Health Records
0 Social Security Numbers
0 Customer Encryption Keys
The above can be read to mean that in any given leak you would expect to find 67.54 Cloudflare internal headers. You’d expect to find a cookie in approximately half of all leaks (0.44 cookies per leak). We did not find any passwords, credit cards, health records, social security numbers, or customer encryption keys in the sample set.
Since this is just a sample, it is not correct to conclude that no passwords, credit cards, health records, social security numbers, or customer encryption keys were ever exposed. However, if there was any exposure, based on the data we’ve reviewed, it does not appear to have been widespread. We have also not had any confirmed reports of third parties discovering any of these sensitive data types on any cached pages.
These findings generally make sense given what we know about traffic to Cloudflare sites. Based on our logs, the ratio of GET to POST requests across our network is approximately 100-to-1. Since POSTs are more likely to contain sensitive data like passwords, we estimate that reduces the potential exposure of the most sensitive data from 1,242,071 requests to closer to 12,420. POSTs that contain particularly sensitive information would then represent only a fraction of the 12,420 we would expect to have leaked.
This is not to downplay the seriousness of the bug. For instance, depending on how a Cloudflare customer’s systems are implemented, cookie data, which would be present in GET requests, could be used to impersonate another user’s session. We’ve seen approximately 150 Cloudflare customers’ data in the more than 80,000 cached pages we’ve purged from search engine caches. When data for a customer is present, we’ve reached out to the customer proactively to share the data that we’ve discovered and help them work to mitigate any impact. Generally, if customer data was exposed, invalidating session cookies and rolling any internal authorization tokens is the best advice to mitigate the largest potential risk based on our investigation so far.
How to Understand Your Risk
We have tried to quantify the risk to individual customers that their data may have leaked. Generally, the more requests that a customer sent to Cloudflare, the more likely it is that their data would have been in memory and therefore exposed. This is anecdotally confirmed by the 150 customers whose data we’ve found in third party caches. The customers whose data appeared in caches are typically the customers that send the most requests through Cloudflare’s network.
Probabilistically, we are able to estimate the likelihood of data leaking for a particular customer based on the number of requests per month (RPM) that they send through our network since the more requests sent through our network the more likely a customer’s data is to be in memory when the bug was triggered. Below is a chart of the number of total anticipated data leak events from 22 September 2016 – 18 February 2017 that we would expect based on the average number of requests per month a customer sends through Cloudflare’s network:
Requests per Month Anticipated Leaks
------------------ -----------------
200B – 300B 22,356 – 33,534
100B – 200B 11,427 – 22,356
50B – 100B 5,962 – 11,427
10B – 50B 1,118 – 5,926
1B – 10B 112 – 1,118
500M – 1B 56 – 112
250M – 500M 25 – 56
100M – 250M 11 – 25
50M – 100M 6 – 11
10M – 50M 1 – 6
< 10M < 1
More than 99% of Cloudflare’s customers send fewer than 10 million requests per month. At that level, probabilistically we would expect that they would have no data leaked during the period the bug was present. For further context, the 100th largest website in the world is estimated to handle fewer than 10 billion requests per month, so there are very few of Cloudflare’s 6 million customers that fall into the top bands of the chart above. Cloudflare customers can find their own RPM by logging into the Cloudflare Analytics Dashboard and looking at the number of requests per month for their sites.
The statistics above assume that each leak contained only one customer’s data. That was true for nearly all of the leaks we reviewed from search engine caches. However, there were instances where more data may have been leaked. The probability table above should be considered just an estimate to help provide some general guidance on the likelihood a customer’s data would have leaked.
Interim Conclusion
We are continuing to work with third party caches to expunge leaked data and will not let up until every bit has been removed. We also continue to analyze Cloudflare’s logs and the particular requests that triggered the bug for anomalies. While we were able to mitigate this bug within minutes of it being reported to us, we want to ensure that other bugs are not present in the code. We have undertaken a full review of the parser code to look for any additional potential vulnerabilities. In addition to our own review, we're working with the outside code auditing firm Veracode to review our code.
Cloudflare’s mission is to help build a better Internet. Everyone on our team comes to work every day to help our customers — regardless of whether they are businesses, non-profits, governments, or hobbyists — run their corner of the Internet a little better. This bug exposed just how much of the Internet puts its trust in us. We know we disappointed you and we apologize. We will continue to share what we discover because we believe trust is critical and transparency is the foundation of that trust.