One of the behind the scenes topics we think about a lot at CloudFlare is how to handle abuse of our network. I realized that we hadn't exposed our thoughts on this clearly enough. In the next few days, we'll be making some minor updates to our Terms of Service to better align it with how we handle abuse complaints. However, I wanted to take the time to write up a post on how we think about abuse. Make sure you're comfy, this is going to be a bit of a marathon post because it's an important and complicated issue.
CloudFlare sits in front of nearly a half a million websites. Those websites include banks, national governments, Fortune 500 companies, universities, media publications, blogs, ecommerce companies, and just about everything else you can find online. Every day we process more page views through our network than Amazon.com, Wikipedia, Twitter, Zynga, Aol, eBay, PayPal, Apple, and Instagram — combined. That's dumbfounding given that CloudFlare is only a year and a half old from our public launch.
While the vast majority of sites on CloudFlare are not problematic, just like on the Internet itself there are inevitably some unsavory organizations on our network. Almost exactly a year ago, I blogged about the notorious hacking group LulzSec using CloudFlare's services and our decision not to terminate their service. As I wrote a year ago:
CloudFlare is firm in our belief that our role is not that of Internet censor. There are tens of thousands of websites currently using CloudFlare's network. Some of them contain information I find troubling. Such is the nature of a free and open network and, as an organization that aims to make the whole Internet faster and safer, such inherently will be our ongoing struggle. While we will respect the laws of the jurisdictions in which we operate, we do not believe it is our decision to determine what content may and may not be published. That is a slippery slope down which we will not tread.
Today there are hundreds of thousands of sites using CloudFlare and we remain concerned about the slippery slope. To be clear, this isn't a financial decision for us. LulzSec and other problematic customers tend to sign up for our free service and we don't make a dime off of them. When they upgrade they usually pay with stolen credit cards, which causes us significant headaches. The decision to err on the side of not terminating sites is a philosophical one: we are rebuilding the Internet, and we don't believe that we or anyone else should have the right to tell people what content they can and cannot publish online.
Who We Terminate
There is no more thankless job than running an abuse desk. In the last week, our abuse team has had to deal with "senior Iranian officials" threatening us over the fact that a pro-Israeli website was on our network while, at the same time, dealing with threats from an Israeli group who was extremely upset that a website supporting the Iranian regime was also on our network. We didn't terminate either of those sites.
No matter how repugnant an idea may be to one person or another, we don't believe we are qualified to act as judge. There are, however, at least two clear cases where we believe our network can cause harm and therefore we do take action: spreading malware or powering phishing sites.
Originally, when we would receive reports of phishing or malware we would terminate the customers immediately. The challenge was that this didn't actually solve the problem. Since we're just a proxy, not the host, us terminating the customer doesn't make the harmful content disappear. Terminating the site effectively just kicked the problem further down the road, moving it off our network and onto someone else's.
(photo credit: Erectus Bee)
This was unsatisfying to our abuse team so we reached out to the experts on the issue of malware and phishing at StopBadware. StopBadware is the organization Google trusts to explain about phishing and malware when they detect problems on pages that appear in the company's search index. We worked with StopBadware to design a Google-like block page that we can display on pages where malware or phishing are detected. This solution actually eliminates the knowm malware and phishing from our network and, at the same time, teaches visitors who may have been fooled by the malicious content about its risks.
This sounds easy — and, as a matter of policy, it was easy — but, technically, it was actually extremely tricky to implement. To give you some sense, we average about 150,000 requests per second through our network and we're doubling every 3 months or so. To make the block pages work, we needed to check every one of those requests against regular expressions that match known phishing or malware sites. All without slowing down requests. It took us longer than I would have liked to find a solution that could scale, but now that it is in place we are actively adding data sources to ensure we promptly remediate any malware and phishing sites on our network.
The Rock and the Hard Place
While we believe we have found a good solution for malware and phishing abuse reports, other abuse requests still present a vexing issues. Originally, when CloudFlare received a DMCA complaint for an alleged copyright infringement, our practice was to turn over the IP address of the site's host to the person filing the complaint. This allowed them to then take the issue up with the hosting provider.
CloudFlare has become very, very good at stopping online attacks, including DDoS attacks. As a result, people launching those attacks have begun looking for ways to bypass our protection. Starting about a year ago, we saw a spike in what turned out to be illegitimate DMCA requests. They would look technically correct, include all the required information, but the complaintant wasn't the actual copyright holder but an individual looking to attack the site. As soon as we turned over the origin IP address they would launch an attack, completely bypassing CloudFlare's protection. In other words, attackers were abusing our abuse process — a problem I wrote about when discussing how SOPA could make things even worse.
If there is a way to reliably tell the difference between a legitimate and an illegitimate DMCA abuse complaint, we haven't found it. As a result, we adjusted our abuse process in order to meet the requirements of the law and allow legitimate complaintants to serve notice to infringers, but not expose our customers to attacks.
In many ways, our abuse flow today is also a sort of reverse proxy. When we receive a complaint, after some checks to ensure it's validity to the extent possible, we forward a copy of the complaint to the site owner via email. We also send a copy of the complaint to the site's hosting provider, including the site's origin IP address and instructions on how they can test to ensure that the site is, in fact, hosted on their network. We then respond to the complainant explaining how CloudFlare works, how we've relayed their complaint, and providing the identity of the site's actual host (although not the site's actual IP address).
We are continuing to refine the process over time to maximize two goals: ensuring our customers are protected from attacks, and ensuring that we don't stand in the way of legitimate complaintants. If you have suggestions on how we can improve the process while balancing these interests, we welcome your input.