Over the last month, we’ve been watching some of the largest distributed denial of service (DDoS) attacks ever seen unfold. As CloudFlare has grown we've brought on line systems capable of absorbing and accurately measuring attacks. Since we don't need to resort to crude techniques to block traffic we can measure and filter attacks with accuracy. Our systems sort bad packets from good, keep websites online and keep track of attack packet rates and bits per second.
The current spate of large attacks are all layer 3 (L3) DDoS. Layer 3 attacks consist of a large volume of packets hitting the target network, and the aim is usually to overwhelm the target network hardware or connectivity.
L3 attacks are dangerous because most of the time the only solution is to acquire large network capacity and buy beefy networking hardware, which is simply not an option for most independent website operators. Or, faced with huge packet rates, some providers simply turn off connections or entirely block IP addresses.
A Typical Day At CloudFlare
Historically, L3 attacks were the biggest headache for CloudFlare. Over the last two years, we’ve automated almost all of our L3 attack handling and these automatic systems protect CloudFlare customers worldwide 24 hours a day.
This chart shows our L3 DoS mitigation during the last quarter of 2015:
The y-axis is the number of individual "DoS events" that we responded to, which is about 20-80 on a typical day. Each of these DoS events are automatically triggered by an attack on one or more of our customers.
Recent DDoS Attacks Were Larger. Much Larger.
Most of the mitigated DoS events are small. When a big attack happens, it usually shows up as a large number of separate events in our system.
During the last month or so, we’e been dealing (mostly automatically) with very large attacks. Here is the same chart, but including the first quarter of 2016. Notice the scale:
That’s about a 15x increase in individual DoS events. These new attacks are interesting for a couple of reasons. First, the spikes align with the weekends. It seems the attackers are busy with something else during the week. Second, they are targeting a couple of fairly benign websites—this demonstrates that anybody can become the target of a large attack. Third, the overall volume of the attack is enormous.
Let me elaborate on this last point.
BGP Black Holes
When operating at a smaller scale, it’s not unusual for a DDoS attack to overwhelm the capacity of the target network. This causes network congestion and forces the operators of the attacked network to black hole the attacked IP addresses, pretty much removing them from the Internet.
After that happens, it’s impossible to report the volume of the attack. The attack traffic will disappear in a “black hole” and is invisible to the operators of the attacked network. This is why reporting on big attacks is difficult—with BGP blackholing, it’s impossible to assess the true scale of the malicious traffic.
We’ve encountered very large attacks in the past, and we didn’t often have to resort to blackholing. This allows us to report in detail on the attack volume we see. Fortunately, the same is true this time around—the size of our network and the efficiency of our systems allowed us to simply absorb the attacks. This is the only reason we’re able to provide the following metrics.
How Big Was This DDoS Attack?
DDoS attacks are measured in two dimensions: the number of malicious packets per second (pps) and the attack bandwidth in bits per second (bps).
Rate of Packets (pps)
The packets per second metric is important, since the processing power required on a router raises proportionally with the pps count. Whenever an attack overwhelms a router’s processing power, we will see a frantic error message like this:
PROBLEM/CRITICAL: edge21.sin01.cloudflare.cc router PFE hardware drops Info cell drops are increasing at a rate of 210106.35/s. Fabric drops are increasing at a rate of 329678.81/s.
It’s not uncommon for attacks against CloudFlare to reach 100Mpps. The recent attacks were larger and peaked at 180Mpps. Here’s a chart of the attack packets per second that we received over the last month:
There aren’t many networks in the world that could sustain this rate of attack packets, but our hardware was able to keep up with all the malicious traffic. Our networking team does an excellent job at keeping CloudFlare’s networking hardware up to the task.
Volume in bits (bps)
The other interesting parameter is size of the attack in gigabits per second. Some large DDoS attacks attempt to saturate the network capacity (bps), rather than router processing power (pps). If that happens, we see alerts like this:
PROBLEM/CRITICAL: edge112.tel01.cloudflare.cc router interfaces ae-1/2/2 (HardLayer) in: 81092mbps (81.10%)
We saw this warning some time ago when a link got quite full. In fact, in recent weeks a couple of our fat 100Gbps links got quite busy, hitting a ceiling at 77Gbps on inbound:
We speculate the attack against this datacenter was larger than 77Gbps, but that congestion occurred on the other side of the network—close to the attackers. We’re investigating this with our Internet providers.
The recent attacks peaked at around 400Gbps of aggregate inbound traffic. The peak wasn't a one-off spike, it lasted for a couple of hours!
Even during the peak attack traffic, our network wasn’t congested and systems operated normally. Our automatic mitigation software was able to sort the good from the bad packets.
Over the past weeks, we encountered a series of large DDoS attacks. In peak, our systems reported over 400Gbps of incoming traffic, which is amongst the largest by aggregate volume that we’ve ever seen. The attack was mostly absorbed by our automatic mitigation systems.
With each attack, we’re improving our automatic attack mitigation system. In addition, our network team constantly keeps an eye on the network, working to identify new bottlenecks. Without our automatic mitigation system and network team, dealing with attacks of this size would not be possible.
You might think that as CloudFlare grows, the relative scale of attacks will shrink in comparison to legitimate traffic. The funny thing is, the opposite is true. Every time we upgrade our network capacity, we see larger attacks because we don’t have to resort to blackholing. With more capacity, we’re able to withstand and accurately measure even bigger attacks.
Interested in dealing with the largest DDoS attacks in the world? We’re hiring in San Francisco, London, and Singapore.