In the past, we’ve spoken about how Cloudflare is architected to sustain the largest DDoS attacks. During traffic surges we spread the traffic across a very large number of edge servers. This architecture allows us to avoid having a single choke point because the traffic gets distributed externally across multiple datacenters and internally across multiple servers. We do that by employing Anycast and ECMP.
We don't use separate scrubbing boxes or specialized hardware - every one of our edge servers can perform advanced traffic filtering if the need arises. This allows us to scale up our DDoS capacity as we grow. Each of the new servers we add to our datacenters increases our maximum theoretical DDoS “scrubbing” power. It also scales down nicely - in smaller datacenters we don't have to overinvest in expensive dedicated hardware.
During normal operations our attitude to attacks is rather pragmatic. Since the inbound traffic is distributed across hundreds of servers we can survive periodic spikes and small attacks without doing anything. Vanilla Linux is remarkably resilient against unexpected network events. This is especially true since kernel 4.4 when the performance of SYN cookies was greatly improved.
But at some point, malicious traffic volume can become so large that we must take the load off the networking stack. We have to minimize the amount of CPU spent on dealing with attack packets. Cloudflare operates a multi-tenant service and we must always have enough processing power to serve valid traffic. We can't afford to starve our HTTP proxy (nginx) or custom DNS server (named RRDNS, written in Go) of CPU. When the attack size crosses a predefined threshold (which varies greatly depending on specific attack type), we must intervene.
During large attacks we deploy mitigations to reduce the CPU consumed by malicious traffic. We have multiple layers of defense, each tuned to specific attack vector.
First, there is “scattering”. Since we control DNS resolution we are able to move the domains we serve between IP addresses (we call this "scattering"). This is an effective technique as long as the attacks don’t follow the updated DNS resolutions. This often happens for L3 attacks where the attacker has hardcoded the IP address of the target.
Next, there is a wide range of mitigation techniques that leverage iptables, the firewall built in to the Linux kernel. But we don't use it like a conventional firewall, with a static set of rules. We continuously add, tweak and remove rules, based on specific attack characteristics. Over the years we have mastered the most effective iptables extensions:
To make the most of iptables, we built a system to manage the iptables configuration across our entire fleet, allowing us to rapidly deploy rules everywhere. This fits our architecture nicely: due to Anycast, an attack against a single IP will be delivered to multiple locations. Running iptables rules for that IP on all servers makes sense.
Using stock iptables gives us plenty of confidence. When possible we prefer to use off-the-shelf tools to deal with attacks.
Sometimes though, even this is not sufficient. Iptables is fast in the general case, but has its limits. During very large attacks, exceeding 1M packets per second per server, we shift the attack traffic from kernel iptables to a kernel bypass user space program (which we call floodgate). We use a partial kernel bypass solution using Solarflare EF_VI interface. With this on each server we can process more than 5M attack packets per second while consuming only a single CPU core. With floodgate we have comfortable amount of CPU left for our applications, even during the largest network events.
Manual attack handling
Early on these mitigations were applied manually by our tireless System Reliability Engineers (SRE's). Unfortunately, it turns out that humans under stress... well, make mistakes. We learned it the hard way - one of the most famous incidents happened in March 2013 when a simple typo brought our whole network down.
Humans are also not great at applying precise rules. As our systems grew and mitigations became more complex, having many specific toggles, our SREs got overwhelmed by the details. It was challenging to present all the specific information about the attack to the operator. We often applied overly-broad mitigations, which were unnecessarily affecting legitimate traffic. All that changed with the introduction of Gatebot.
To aid our SREs we developed a fully automatic mitigation system. We call it Gatebot1.
The main goal of Gatebot was to automate as much of the mitigation workflow as possible. That means: to observe the network and note the anomalies, understand the targets of attacks and their metadata (such as the type of customer involved), and perform appropriate mitigation action.
Nowadays we have multiple Gatebot instances - we call them “mitigation pipelines”. Each pipeline has three parts:
1) “attack detection” or “signal” - A dedicated system detects anomalies in network traffic. This is usually done by sampling a small fraction of the network packets hitting our network, and analyzing them using streaming algorithms. With this we have a real-time view of the current status of the network. This part of the stack is written in Golang, and even though it only examines the sampled packets, it's pretty CPU intensive. It might comfort you to know that at this very moment two big Xeon servers burn all of their combined 48 Skylake CPU cores toiling away counting packets and performing sophisticated analytics looking for attacks.
2) “reactive automation” or “business logic”. For each anomaly (attack) we see who the target is, can we mitigate it, and with what parameters. Depending on the specific pipeline, the business logic may be anything from a trivial procedure to a multi-step process requiring a number of database lookups and potentially confirmation from a human operator. This code is not performance critical and is written in Python. To make it more accessible and readable by others in company, we developed a simple functional, reactive programming engine. It helps us to keep the code clean and understandable, even as we add more steps, more pipelines and more complex logic. To give you a flavor of the complexity: imagine how the system should behave if a customer upgraded a plan during an attack.
3) “mitigation”. The previous step feeds specific mitigation instructions into the centralized mitigation management systems. The mitigations are deployed across the world to our servers, applications, customer settings and, in some cases, to the network hardware.
Sleeping at night
Gatebot operates constantly, without breaks for lunch. For the iptables mitigations pipelines alone, Gatebot got engaged between 30 and 1500 times a day. Here is a chart of mitigations per day over last 6 months:
Gatebot is much faster and much more precise than even our most experienced SREs. Without Gatebot we wouldn’t be able to operate our service with the appropriate level of confidence. Furthermore, Gatebot has proved to be remarkably adaptable - we started by automating handling of Layer 3 attacks, but soon we proved that the general model works well for automating other things. Today we have more than 10 separate Gatebot instances doing everything from mitigating Layer 7 attacks to informing our Customer Support team of misbehaving customer origin servers.
Gatebot allows us to protect our users no matter of the plan. Whether you are a on a Free, Pro, Business or Enterprise plan, Gatebot is working for you. This is why we can afford to provide the same level of DDoS protection for all our customers3.
Dealing with attacks sound interesting? Join our world famous DDoS team in London, Austin, San Francisco and our elite office in Warsaw, Poland.
Fun fact: all our components in this area are called “gate-something”, like: gatekeeper, gatesetter, floodgate, gatewatcher, gateman... Who said that naming things must be hard? ↩
Some of us have argued that this system should be called Netbot. ↩
Note: there are caveats. Ask your Success Engineer for specifics! ↩