
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Sat, 04 Apr 2026 06:35:04 GMT</lastBuildDate>
        <item>
            <title><![CDATA[SLP: a new DDoS amplification vector in the wild]]></title>
            <link>https://blog.cloudflare.com/slp-new-ddos-amplification-vector/</link>
            <pubDate>Tue, 25 Apr 2023 13:07:56 GMT</pubDate>
            <description><![CDATA[ Researchers have recently published the discovery of a new DDoS reflection/amplification attack vector leveraging the SLP protocol. Cloudflare expects the prevalence of SLP-based DDoS attacks to rise in the coming weeks ]]></description>
            <content:encoded><![CDATA[ 
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3I71ArpV1rGLMEvmAECNwa/1307863c865f182b789b3a3e1ea4f078/image13-1-4.png" />
            
            </figure><p>Earlier today, April 25, 2023, researchers Pedro Umbelino at <a href="https://www.bitsight.com/blog/new-high-severity-vulnerability-cve-2023-29552-discovered-service-location-protocol-slp">Bitsight</a> and Marco Lux at <a href="https://curesec.com/blog/article/CVE-2023-29552-Service-Location-Protocol-Denial-of-Service-Amplification-Attack-212.html">Curesec</a> published their discovery of CVE-2023-29552, a new <a href="https://www.cisa.gov/news-events/alerts/2014/01/17/udp-based-amplification-attacks">DDoS reflection/amplification attack vector</a> leveraging the SLP protocol. If you are a Cloudflare customer, your services are already protected from this new attack vector.</p><p><a href="https://en.wikipedia.org/wiki/Service_Location_Protocol">Service Location Protocol</a> (SLP) is a “service discovery” protocol invented by Sun Microsystems in 1997. Like other service discovery protocols, it was designed to allow devices in a local area network to interact without prior knowledge of each other. SLP is a relatively obsolete protocol and has mostly been supplanted by more modern alternatives like UPnP, mDNS/Zeroconf, and WS-Discovery. Nevertheless, many commercial products still offer support for SLP.</p><p>Since SLP has no method for authentication, it should never be exposed to the public Internet. However, Umbelino and Lux have discovered that upwards of 35,000 Internet endpoints have their devices’ SLP service exposed and accessible to anyone. Additionally, they have discovered that the UDP version of this protocol has an <a href="/reflections-on-reflections/">amplification factor</a> of up to 2,200x, which is the third largest discovered to-date.</p><p>Cloudflare expects the prevalence of SLP-based DDoS attacks to rise significantly in the coming weeks as malicious actors learn how to exploit this newly discovered attack vector.</p>
    <div>
      <h3>Cloudflare customers are protected</h3>
      <a href="#cloudflare-customers-are-protected">
        
      </a>
    </div>
    <p>If you are a Cloudflare customer, our <a href="/deep-dive-cloudflare-autonomous-edge-ddos-protection/">automated DDoS protection system</a> already protects your services from these SLP amplification attacks.To avoid being exploited to launch the attacks, if you are a network operator, you should ensure that you are not exposing the SLP protocol directly to the public Internet. You should consider blocking UDP port 427 via access control lists or other means. This port is rarely used on the public Internet, meaning it is relatively safe to block without impacting legitimate traffic. Cloudflare <a href="https://developers.cloudflare.com/magic-transit/">Magic Transit</a> customers can use the <a href="https://developers.cloudflare.com/magic-firewall/">Magic Firewall</a> to craft and deploy such rules.</p> ]]></content:encoded>
            <category><![CDATA[CVE]]></category>
            <category><![CDATA[Vulnerabilities]]></category>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[Attacks]]></category>
            <category><![CDATA[Mitigation]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Security]]></category>
            <guid isPermaLink="false">5wnL1ufYrN0dZqE5GMsDZ8</guid>
            <dc:creator>Alex Forster</dc:creator>
            <dc:creator>Omer Yoachimik</dc:creator>
        </item>
        <item>
            <title><![CDATA[How to drop 10 million packets per second]]></title>
            <link>https://blog.cloudflare.com/how-to-drop-10-million-packets/</link>
            <pubDate>Fri, 06 Jul 2018 13:00:00 GMT</pubDate>
            <description><![CDATA[ Internally our DDoS mitigation team is sometimes called "the packet droppers". When other teams build exciting products to do smart things with the traffic that passed through our network, we take joy in discovering novel ways of discarding it. ]]></description>
            <content:encoded><![CDATA[ <p>Internally our DDoS mitigation team is sometimes called "the packet droppers". When other teams build exciting products to do smart things with the traffic that passes through our network, we take joy in discovering novel ways of discarding it.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7yS2j3vUrdmAkLkGjcunWg/cabb3814dc89260f1338ee65cd38466f/38464589350_d00908ee98_b.jpg" />
            
            </figure><p><a href="https://creativecommons.org/licenses/by-sa/2.0/">CC BY-SA 2.0</a> <a href="https://www.flickr.com/photos/beegee49/38464589350">image</a> by <a href="https://www.flickr.com/photos/beegee49">Brian Evans</a></p><p>Being able to quickly discard packets is very important to withstand DDoS attacks.</p><p>Dropping packets hitting our servers, as simple as it sounds, can be done on multiple layers. Each technique has its advantages and limitations. In this blog post we'll review all the techniques we tried thus far.</p>
    <div>
      <h3>Test bench</h3>
      <a href="#test-bench">
        
      </a>
    </div>
    <p>To illustrate the relative performance of the methods we'll show some numbers. The benchmarks are synthetic, so take the numbers with a grain of salt. We'll use one of our Intel servers, with a 10Gbps network card. The hardware details aren't too important, since the tests are prepared to show the operating system, not hardware, limitations.</p><p>Our testing setup is prepared as follows:</p><ul><li><p>We transmit a large number of tiny UDP packets, reaching 14Mpps (millions packets per second).</p></li><li><p>This traffic is directed towards a single CPU on a target server.</p></li><li><p>We measure the number of packets handled by the kernel on that one CPU.</p></li></ul><p>We're not trying to maximize userspace application speed, nor packet throughput - instead, we're trying to specifically show kernel bottlenecks.</p><p>The synthetic traffic is prepared to put maximum stress on conntrack - it uses random source IP and port fields. Tcpdump will show it like this:</p>
            <pre><code>$ tcpdump -ni vlan100 -c 10 -t udp and dst port 1234
IP 198.18.40.55.32059 &gt; 198.18.0.12.1234: UDP, length 16
IP 198.18.51.16.30852 &gt; 198.18.0.12.1234: UDP, length 16
IP 198.18.35.51.61823 &gt; 198.18.0.12.1234: UDP, length 16
IP 198.18.44.42.30344 &gt; 198.18.0.12.1234: UDP, length 16
IP 198.18.106.227.38592 &gt; 198.18.0.12.1234: UDP, length 16
IP 198.18.48.67.19533 &gt; 198.18.0.12.1234: UDP, length 16
IP 198.18.49.38.40566 &gt; 198.18.0.12.1234: UDP, length 16
IP 198.18.50.73.22989 &gt; 198.18.0.12.1234: UDP, length 16
IP 198.18.43.204.37895 &gt; 198.18.0.12.1234: UDP, length 16
IP 198.18.104.128.1543 &gt; 198.18.0.12.1234: UDP, length 16</code></pre>
            <p>On the target side all of the packets are going to be forwarded to exactly one RX queue, therefore one CPU. We do this with hardware flow steering:</p>
            <pre><code>ethtool -N ext0 flow-type udp4 dst-ip 198.18.0.12 dst-port 1234 action 2</code></pre>
            <p>Benchmarking is always hard. When preparing the tests we learned that having any active raw sockets destroys performance. It's obvious in hindsight, but easy to miss. Before running any tests remember to make sure you don't have any stale <code>tcpdump</code> process running. This is how to check it, showing a bad process active:</p>
            <pre><code>$ ss -A raw,packet_raw -l -p|cat
Netid  State      Recv-Q Send-Q Local Address:Port
p_raw  UNCONN     525157 0      *:vlan100          users:(("tcpdump",pid=23683,fd=3))</code></pre>
            <p>Finally, we are going to disable the Intel Turbo Boost feature on the machine:</p>
            <pre><code>echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo</code></pre>
            <p>While Turbo Boost is nice and increases throughput by at least 20%, it also drastically worsens the standard deviation in our tests. With turbo enabled we had ±1.5% deviation in our numbers. With Turbo off this falls down to manageable 0.25%.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5oJAi3KKpj9clrG5Abebho/59ed605de89f49183c5b360a4b50d30c/layers.JPG.jpeg" />
            
            </figure>
    <div>
      <h4>Step 1. Dropping packets in application</h4>
      <a href="#step-1-dropping-packets-in-application">
        
      </a>
    </div>
    <p>Let's start with the idea of delivering packets to an application and ignoring them in userspace code. For the test setup, let's make sure our iptables don't affect the performance:</p>
            <pre><code>iptables -I PREROUTING -t mangle -d 198.18.0.12 -p udp --dport 1234 -j ACCEPT
iptables -I PREROUTING -t raw -d 198.18.0.12 -p udp --dport 1234 -j ACCEPT
iptables -I INPUT -t filter -d 198.18.0.12 -p udp --dport 1234 -j ACCEPT</code></pre>
            <p>The application code is a simple loop, receiving data and immediately discarding it in the userspace:</p>
            <pre><code>s = socket.socket(AF_INET, SOCK_DGRAM)
s.bind(("0.0.0.0", 1234))
while True:
    s.recvmmsg([...])</code></pre>
            <p><a href="https://github.com/cloudflare/cloudflare-blog/blob/master/2018-07-dropping-packets/recvmmsg-loop.c">We prepared the code</a>, to run it:</p>
            <pre><code>$ ./dropping-packets/recvmmsg-loop
packets=171261 bytes=1940176</code></pre>
            <p>This setup allows the kernel to receive a meagre 175kpps from the hardware receive queue, as measured by <code>ethtool</code> and using our simple <a href="/three-little-tools-mmsum-mmwatch-mmhistogram/"><code>mmwatch</code> tool</a>:</p>
            <pre><code>$ mmwatch 'ethtool -S ext0|grep rx_2'
 rx2_packets: 174.0k/s</code></pre>
            <p>The hardware technically gets 14Mpps off the wire, but it's impossible to pass it all to a single RX queue handled by only one CPU core doing kernel work. <code>mpstat</code> confirms this:</p>
            <pre><code>$ watch 'mpstat -u -I SUM -P ALL 1 1|egrep -v Aver'
01:32:05 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
01:32:06 PM    0    0.00    0.00    0.00    2.94    0.00    3.92    0.00    0.00    0.00   93.14
01:32:06 PM    1    2.17    0.00   27.17    0.00    0.00    0.00    0.00    0.00    0.00   70.65
01:32:06 PM    2    0.00    0.00    0.00    0.00    0.00  100.00    0.00    0.00    0.00    0.00
01:32:06 PM    3    0.95    0.00    1.90    0.95    0.00    3.81    0.00    0.00    0.00   92.38</code></pre>
            <p>As you can see application code is not a bottleneck, using 27% sys + 2% userspace on CPU #1, while network SOFTIRQ on CPU #2 uses 100% resources.</p><p>By the way, using <code>recvmmsg(2)</code> is important. In these post-Spectre days, syscalls got more expensive and indeed, we run kernel 4.14 with KPTI and retpolines:</p>
            <pre><code>$ tail -n +1 /sys/devices/system/cpu/vulnerabilities/*
==&gt; /sys/devices/system/cpu/vulnerabilities/meltdown &lt;==
Mitigation: PTI

==&gt; /sys/devices/system/cpu/vulnerabilities/spectre_v1 &lt;==
Mitigation: __user pointer sanitization

==&gt; /sys/devices/system/cpu/vulnerabilities/spectre_v2 &lt;==
Mitigation: Full generic retpoline, IBPB, IBRS_FW</code></pre>
            
    <div>
      <h4>Step 2. Slaughter conntrack</h4>
      <a href="#step-2-slaughter-conntrack">
        
      </a>
    </div>
    <p>We specifically designed the test - by choosing random source IP and ports - to put stress on the conntrack layer. This can be verified by looking at number of conntrack entries, which during the test is reaching the maximum:</p>
            <pre><code>$ conntrack -C
2095202

$ sysctl net.netfilter.nf_conntrack_max
net.netfilter.nf_conntrack_max = 2097152</code></pre>
            <p>You can also observe conntrack shouting in <code>dmesg</code>:</p>
            <pre><code>[4029612.456673] nf_conntrack: nf_conntrack: table full, dropping packet
[4029612.465787] nf_conntrack: nf_conntrack: table full, dropping packet
[4029617.175957] net_ratelimit: 5731 callbacks suppressed</code></pre>
            <p>To speed up our tests let's disable it:</p>
            <pre><code>iptables -t raw -I PREROUTING -d 198.18.0.12 -p udp -m udp --dport 1234 -j NOTRACK</code></pre>
            <p>And rerun the tests:</p>
            <pre><code>$ ./dropping-packets/recvmmsg-loop
packets=331008 bytes=5296128</code></pre>
            <p>This instantly bumps the application receive performance to 333kpps. Hurray!</p><p>PS. With SO_BUSY_POLL we can bump the numbers to 470k pps, but this is a subject for another time.</p>
    <div>
      <h4>Step 3. BPF drop on a socket</h4>
      <a href="#step-3-bpf-drop-on-a-socket">
        
      </a>
    </div>
    <p>Going further, why deliver packets to userspace application at all? While this technique is uncommon, we can attach a classical BPF filter to a SOCK_DGRAM socket with <code>setsockopt(SO_ATTACH_FILTER)</code> and program the filter to discard packets in kernel space.</p><p><a href="https://github.com/cloudflare/cloudflare-blog/blob/master/2018-07-dropping-packets/bpf-drop.c">See the code</a>, to run it:</p>
            <pre><code>$ ./bpf-drop
packets=0 bytes=0</code></pre>
            <p>With drops in BPF (both Classical as well as extended eBPF have similar performance) we process roughly 512kpps. All of them get dropped in the BPF filter while still in software interrupt mode, which saves us CPU needed to wake up the userspace application.</p>
    <div>
      <h4>Step 4. iptables DROP after routing</h4>
      <a href="#step-4-iptables-drop-after-routing">
        
      </a>
    </div>
    <p>As a next step we can simply drop packets in the iptables firewall INPUT chain by adding rule like this:</p>
            <pre><code>iptables -I INPUT -d 198.18.0.12 -p udp --dport 1234 -j DROP</code></pre>
            <p>Remember we disabled conntrack already with <code>-j NOTRACK</code>. These two rules give us 608kpps.</p><p>The numbers in iptables counters:</p>
            <pre><code>$ mmwatch 'iptables -L -v -n -x | head'

Chain INPUT (policy DROP 0 packets, 0 bytes)
    pkts      bytes target     prot opt in     out     source               destination
605.9k/s    26.7m/s DROP       udp  --  *      *       0.0.0.0/0            198.18.0.12          udp dpt:1234</code></pre>
            <p>600kpps is not bad, but we can do better!</p>
    <div>
      <h4>Step 5. iptables DROP in PREROUTING</h4>
      <a href="#step-5-iptables-drop-in-prerouting">
        
      </a>
    </div>
    <p>An even faster technique is to drop packets before they get routed. This rule can do this:</p>
            <pre><code>iptables -I PREROUTING -t raw -d 198.18.0.12 -p udp --dport 1234 -j DROP</code></pre>
            <p>This produces whopping 1.688mpps.</p><p>This is quite a significant jump in performance, I don't fully understand it. Either our routing layer is unusually complex or there is a bug in our server configuration.</p><p>In any case - "raw" iptables table is definitely way faster.</p>
    <div>
      <h4>Step 6. nftables DROP before CONNTRACK</h4>
      <a href="#step-6-nftables-drop-before-conntrack">
        
      </a>
    </div>
    <p>Iptables is considered passé these days. The new kid in town is nftables. See this <a href="https://www.youtube.com/watch?v=9Zr8XqdET1c">video for a technical explanation why</a> nftables is superior. Nftables promises to be faster than gray haired iptables for many reasons, among them is a rumor that retpolines (aka: no speculation for indirect jumps) hurt iptables quite badly.</p><p>Since this article is not about comparing the nftables vs iptables speed, let's try only the fastest drop I could came up with:</p>
            <pre><code>nft add table netdev filter
nft -- add chain netdev filter input { type filter hook ingress device vlan100 priority -500 \; policy accept \; }
nft add rule netdev filter input ip daddr 198.18.0.0/24 udp dport 1234 counter drop
nft add rule netdev filter input ip6 daddr fd00::/64 udp dport 1234 counter drop</code></pre>
            <p>The counters can be seen with this command:</p>
            <pre><code>$ mmwatch 'nft --handle list chain netdev filter input'
table netdev filter {
    chain input {
        type filter hook ingress device vlan100 priority -500; policy accept;
        ip daddr 198.18.0.0/24 udp dport 1234 counter packets    1.6m/s bytes    69.6m/s drop # handle 2
        ip6 daddr fd00::/64 udp dport 1234 counter packets 0 bytes 0 drop # handle 3
    }
}</code></pre>
            <p>Nftables "ingress" hook yields around 1.53mpps. This is slightly slower than iptables in the PREROUTING layer. This is puzzling - theoretically "ingress" happens before PREROUTING, so should be faster.</p><p>In our test nftables was slightly slower than iptables, but not by much. Nftables is still better :P</p>
    <div>
      <h4>Step 7. tc ingress handler DROP</h4>
      <a href="#step-7-tc-ingress-handler-drop">
        
      </a>
    </div>
    <p>A somewhat surprising fact is that a tc (traffic control) ingress hook happens before even PREROUTING. tc makes it possible to select packets based on basic criteria and indeed - action drop - them. The syntax is rather hacky, so it's recommended to <a href="https://github.com/netoptimizer/network-testing/blob/master/bin/tc_ingress_drop.sh">use this script</a> to set it up. We need a tiny bit more complex tc match, here is the command line:</p>
            <pre><code>tc qdisc add dev vlan100 ingress
tc filter add dev vlan100 parent ffff: prio 4 protocol ip u32 match ip protocol 17 0xff match ip dport 1234 0xffff match ip dst 198.18.0.0/24 flowid 1:1 action drop
tc filter add dev vlan100 parent ffff: protocol ipv6 u32 match ip6 dport 1234 0xffff match ip6 dst fd00::/64 flowid 1:1 action drop</code></pre>
            <p>We can verify it:</p>
            <pre><code>$ mmwatch 'tc -s filter  show dev vlan100  ingress'
filter parent ffff: protocol ip pref 4 u32 
filter parent ffff: protocol ip pref 4 u32 fh 800: ht divisor 1 
filter parent ffff: protocol ip pref 4 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1  (rule hit   1.8m/s success   1.8m/s)
  match 00110000/00ff0000 at 8 (success   1.8m/s ) 
  match 000004d2/0000ffff at 20 (success   1.8m/s ) 
  match c612000c/ffffffff at 16 (success   1.8m/s ) 
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1 installed 1.0/s sec
        Action statistics:
        Sent    79.7m/s bytes   1.8m/s pkt (dropped   1.8m/s, overlimits 0 requeues 0) 
        backlog 0b 0p requeues 0</code></pre>
            <p>A tc ingress hook with u32 match allows us to drop 1.8mpps on a single CPU. This is brilliant!</p><p>But we can go even faster...</p>
    <div>
      <h4>Step 8. XDP_DROP</h4>
      <a href="#step-8-xdp_drop">
        
      </a>
    </div>
    <p>Finally, the ultimate weapon is XDP - <a href="https://prototype-kernel.readthedocs.io/en/latest/networking/XDP/">eXpress Data Path</a>. With XDP we can run eBPF code in the context of a network driver. Most importantly, this is before the <code>skbuff</code> memory allocation, allowing great speeds.</p><p>Usually XDP projects have two parts:</p><ul><li><p>the eBPF code loaded into the kernel context</p></li><li><p>the userspace loader, which loads the code onto the right network card and manages it</p></li></ul><p>Writing the loader is pretty hard, so instead we can use the <a href="https://cilium.readthedocs.io/en/latest/bpf/#iproute2">new <code>iproute2</code> feature</a> and load the code with this trivial command:</p>
            <pre><code>ip link set dev ext0 xdp obj xdp-drop-ebpf.o</code></pre>
            <p>Tadam!</p><p>The source code for <a href="https://github.com/cloudflare/cloudflare-blog/blob/master/2018-07-dropping-packets/xdp-drop-ebpf.c">the loaded eBPF XDP program is available here</a>. The program parses IP packets and looks for desired characteristics: IP transport, UDP protocol, desired target subnet and destination port:</p>
            <pre><code>if (h_proto == htons(ETH_P_IP)) {
    if (iph-&gt;protocol == IPPROTO_UDP
        &amp;&amp; (htonl(iph-&gt;daddr) &amp; 0xFFFFFF00) == 0xC6120000 // 198.18.0.0/24
        &amp;&amp; udph-&gt;dest == htons(1234)) {
        return XDP_DROP;
    }
}</code></pre>
            <p>XDP program needs to be compiled with modern <code>clang</code> that can emit BPF bytecode. After this we can load and verify the running XDP program:</p>
            <pre><code>$ ip link show dev ext0
4: ext0: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 xdp qdisc fq state UP mode DEFAULT group default qlen 1000
    link/ether 24:8a:07:8a:59:8e brd ff:ff:ff:ff:ff:ff
    prog/xdp id 5 tag aedc195cc0471f51 jited</code></pre>
            <p>And see the numbers in <code>ethtool -S</code> network card statistics:</p>
            <pre><code>$ mmwatch 'ethtool -S ext0|egrep "rx"|egrep -v ": 0"|egrep -v "cache|csum"'
     rx_out_of_buffer:     4.4m/s
     rx_xdp_drop:         10.1m/s
     rx2_xdp_drop:        10.1m/s</code></pre>
            <p>Whooa! With XDP we can drop 10 million packets per second on a single CPU.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5SE8JnlqRwU9PvWk57Drtp/f399e77d9a09e6ca767caf3818d35bee/225821241_ed5da2da91_o.jpg" />
            
            </figure><p><a href="https://creativecommons.org/licenses/by-sa/2.0/">CC BY-SA 2.0</a> <a href="https://www.flickr.com/photos/afiler/225821241/">image</a> by <a href="https://www.flickr.com/photos/afiler/">Andrew Filer</a></p>
    <div>
      <h3>Summary</h3>
      <a href="#summary">
        
      </a>
    </div>
    <p>We repeated the these for both IPv4 and IPv6 and prepared this chart:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7gFSZtRxYtyzjQwOjcfXE/7815d540a5dcd04bacd0821d2e7067fc/numbers-noxdp.png" />
            
            </figure><p>Generally speaking in our setup IPv6 had slightly lower performance. Remember that IPv6 packets are slightly larger, so some performance difference is unavoidable.</p><p>Linux has numerous hooks that can be used to filter packets, each with different performance and ease of use characteristics.</p><p>For DDoS purporses, it may totally be reasonable to just receive the packets in the application and process them in userspace. Properly tuned applications can get pretty decent numbers.</p><p>For DDoS attacks with random/spoofed source IP's, it might be worthwhile disabling conntrack to gain some speed. Be careful though - there are attacks for which conntrack is very helpful.</p><p>In other circumstances it may make sense to integrate the Linux firewall into the DDoS mitigation pipeline. In such cases, remember to put the mitigations in a "-t raw PREROUTING" layer, since it's significantly faster than "filter" table.</p><p>For even more demanding workloads, we always have XDP. And boy, it is powerful. Here is the same chart as above, but including XDP:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2OUdMjazlfP6gpm6usqmj7/20f07e262782d9ef5c67ca46a603fed6/numbers-xdp-1.png" />
            
            </figure><p>If you want to reproduce these numbers, <a href="https://github.com/cloudflare/cloudflare-blog/blob/master/2018-07-dropping-packets/README.md">see the README where we documented everything</a>.</p><p>Here at Cloudflare we are using... almost all of these techniques. Some of the userspace tricks are integrated with our applications. The iptables layer is managed by <a href="/meet-gatebot-a-bot-that-allows-us-to-sleep/">our Gatebot DDoS pipeline</a>. Finally, we are working on replacing our proprietary kernel offload solution with XDP.</p><p>Want to help us drop more packets? We're hiring for many roles, including packet droppers, systems engineers and more!</p><p><i>Special thanks to </i><a href="https://twitter.com/JesperBrouer"><i>Jesper Dangaard Brouer</i></a><i> for helping with this work.</i></p> ]]></content:encoded>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[Mitigation]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Attacks]]></category>
            <category><![CDATA[Security]]></category>
            <guid isPermaLink="false">lrRGZKcpgb4NjHk3esKtY</guid>
            <dc:creator>Marek Majkowski</dc:creator>
        </item>
        <item>
            <title><![CDATA[Today we mitigated 1.1.1.1]]></title>
            <link>https://blog.cloudflare.com/today-we-mitigated-1-1-1-1/</link>
            <pubDate>Fri, 01 Jun 2018 01:13:53 GMT</pubDate>
            <description><![CDATA[ Cloudflare is protected from attacks by the Gatebot DDoS mitigation pipeline. Gatebot performs hundreds of mitigations a day, shielding our infrastructure and our customers from L3 and L7 attacks.  ]]></description>
            <content:encoded><![CDATA[ <p>On May 31, 2018 we had a 17 minute outage on our 1.1.1.1 resolver service; this was our doing and not the result of an attack.</p><p>Cloudflare is protected from attacks by the Gatebot DDoS mitigation pipeline. Gatebot performs hundreds of mitigations a day, shielding our infrastructure and our customers from L3/L4 and L7 attacks. Here is a chart of a count of daily Gatebot actions this year:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4afmhlOoruRjYpiDVEnb6L/1c58c4fab1a06fd06f61b9bbc6a62ee5/gatebot-stats.png" />
            
            </figure><p>In the past, we have blogged about our systems:</p><ul><li><p><a href="/meet-gatebot-a-bot-that-allows-us-to-sleep/">Meet Gatebot, a bot that allows us to sleep</a></p></li></ul><p>Today, things didn't go as planned.</p>
    <div>
      <h3>Gatebot</h3>
      <a href="#gatebot">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6pW1whDtkORpQAEePgmHo/63c2705e59c6eca69a413b190604b955/gatebot-parts.png" />
            
            </figure><p>Cloudflare’s network is large, handles many different types of traffic and mitigates different types of known and not-yet-seen attacks. The Gatebot pipeline manages this complexity in three separate stages:</p><ul><li><p><i>attack detection</i> - collects live traffic measurements across the globe and detects attacks</p></li><li><p><i>reactive automation</i> - chooses appropriate mitigations</p></li><li><p><i>mitigations</i> - executes mitigation logic on the edge</p></li></ul><p>The benign-sounding "reactive automation" part is actually the most complicated stage in the pipeline. We expected that from the start, which is why we implemented this stage using a custom <a href="https://en.wikipedia.org/wiki/Functional_reactive_programming">Functional Reactive Programming (FRP)</a> framework. If you want to know more about it, see <a href="https://idea.popcount.org/2016-02-01-enigma---building-a-dos-mitigation-pipeline/">the talk</a> and <a href="https://speakerdeck.com/majek04/gatelogic-somewhat-functional-reactive-framework-in-python">the presentation</a>.</p><p>Our mitigation logic often combines multiple inputs from different internal systems, to come up with the best, most appropriate mitigation. One of the most important inputs is the metadata about our IP address allocations: we mitigate attacks hitting HTTP and DNS IP ranges differently. Our FRP framework allows us to express this in clear and readable code. For example, this is part of the code responsible for performing DNS attack mitigation:</p>
            <pre><code>def action_gk_dns(...):

    [...]

    if port != 53:
        return None

    if whitelisted_ip.get(ip):
        return None

    if ip not in ANYCAST_IPS:
        return None
        [...] </code></pre>
            <p>It's the last check in this code that we tried to improve today.</p><p>Clearly, the code above is a huge oversimplification of all that goes into attack mitigation, but making an early decision about whether the attacked IP serves DNS traffic or not is important. It's that check that went wrong today. If the IP does serve DNS traffic then attack mitigation is handled differently from IPs that never serve DNS.</p>
    <div>
      <h3>Cloudflare is growing, so must Gatebot</h3>
      <a href="#cloudflare-is-growing-so-must-gatebot">
        
      </a>
    </div>
    <p>Gatebot was created in early 2015. Three years may not sound like much time, but since then we've grown dramatically and added layers of services to our software stack. Many of the internal integration points that we rely on today didn't exist then.</p><p>One of them is what we call the <i>Provision API</i>. When Gatebot sees an IP address, it needs to be able to figure out whether or not it’s one of Cloudflare’s addresses. <i>Provision API</i> is a simple RESTful API used to provide this kind of information.</p><p>This is a relatively new API, and prior to its existence, Gatebot had to figure out which IP addresses were Cloudflare addresses by reading a list of networks from a hard-coded file. In the code snippet above, the <i>ANYCAST_IPS</i> variable is populated using this file.</p>
    <div>
      <h3>Things went wrong</h3>
      <a href="#things-went-wrong">
        
      </a>
    </div>
    <p>Today, in an effort to reclaim some technical debt, we deployed new code that introduced Gatebot to <i>Provision API</i>.</p><p>What we did not account for, and what <i>Provision API</i> didn’t know about, was that <a href="/dns-resolver-1-1-1-1/">1.1.1.0/24 and 1.0.0.0/24</a> are special IP ranges. Frankly speaking, almost every IP range is "special" for one reason or another, since our IP configuration is rather complex. But our recursive DNS resolver ranges are even more special: they are relatively new, and we're using them in a very unique way. Our hardcoded list of Cloudflare addresses contained a manual exception specifically for these ranges.</p><p>As you might be able to guess by now, we didn't implement this manual exception while we were doing the integration work. Remember, the whole idea of the fix was to remove the hardcoded gotchas!</p>
    <div>
      <h3>Impact</h3>
      <a href="#impact">
        
      </a>
    </div>
    <p>The effect was that, after pushing the new code release, our systems interpreted the resolver traffic as an attack. The automatic systems deployed DNS mitigations for our DNS resolver IP ranges for 17 minutes, between 17:58 and 18:13 May 31st UTC. This caused 1.1.1.1 DNS resolver to be globally inaccessible.</p>
    <div>
      <h3>Lessons Learned</h3>
      <a href="#lessons-learned">
        
      </a>
    </div>
    <p>While Gatebot, the DDoS mitigation system, has great power, we failed to test the changes thoroughly. We are using today’s incident to improve our internal systems.</p><p>Our team is incredibly proud of 1.1.1.1 and Gatebot, but today we fell short. We want to apologize to all of our customers. We will use today’s incident to improve. The next time we mitigate 1.1.1.1 traffic, we will make sure there is a legitimate attack hitting us.</p> ]]></content:encoded>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Post Mortem]]></category>
            <category><![CDATA[Mitigation]]></category>
            <category><![CDATA[1.1.1.1]]></category>
            <category><![CDATA[DNS]]></category>
            <category><![CDATA[Resolver]]></category>
            <category><![CDATA[Gatebot]]></category>
            <guid isPermaLink="false">6ZUfF0dLtSHFE2WWfwpgLJ</guid>
            <dc:creator>Marek Majkowski</dc:creator>
        </item>
        <item>
            <title><![CDATA[Rate Limiting: Delivering more rules, and greater control]]></title>
            <link>https://blog.cloudflare.com/rate-limiting-delivering-more-rules-and-greater-control/</link>
            <pubDate>Mon, 21 May 2018 20:41:37 GMT</pubDate>
            <description><![CDATA[ With more platforms adopting DDoS safeguards like integrating mitigation services and enhancing bandwidth at vulnerable points, Layer 3 and 4 attacks are becoming far less effective than before. ]]></description>
            <content:encoded><![CDATA[ <p>With more and more platforms taking the <a href="https://www.cloudflare.com/learning/ddos/how-to-prevent-ddos-attacks/">necessary precautions against DDoS attacks</a> like integrating DDoS mitigation services and increasing bandwidth at weak points, Layer 3 and 4 attacks are just not as effective anymore. For Cloudflare, we have fully automated Layer 3/4 based protections with our internal platform, <a href="/meet-gatebot-a-bot-that-allows-us-to-sleep/">Gatebot</a>. In the last 6 months we have seen a large upward trend of Layer 7 based DDoS attacks. The key difference to these attacks is they are no longer focused on using huge payloads (volumetric attacks), but based on Requests per Second to exhaust server resources (CPU, Disk and Memory). On a regular basis we see attacks that are over 1 million requests per second. The graph below shows the number of Layer 7 attacks Cloudflare has monitored, which is trending up. On average seeing around 160 attacks a day, with some days spiking up to over 1000 attacks.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2KCzsb3VI9QyzaxYXWfPlW/bff6e0c18354270863ab8b5626f7dff1/Screen-Shot-2018-05-21-at-10.36.27-AM.png" />
            
            </figure><p>A year ago, Cloudflare released <a href="/rate-limiting/">Rate Limiting</a>, and it is proving to be a hugely effective tool for customers to protect their web applications and <a href="https://www.cloudflare.com/learning/security/api/what-is-an-api/">APIs</a> from all sorts of attacks, from “low and slow” DDoS attacks, through to bot-based attacks, such as credential stuffing and content scraping. We’re pleased about the success our customers are seeing with Rate Limiting and are excited to announce additional capabilities to give our customers further control.</p>
    <div>
      <h3>So what’s changing?</h3>
      <a href="#so-whats-changing">
        
      </a>
    </div>
    <p>There are times when you clearly know that traffic is malicious. In cases like this, our existing Block action is proving effective for our customers. But there are times when it is not the best option, and causes a negative user experience. Rather than risk a false negative, customers often want to challenge a client to ensure they are who they represent themselves to be, which is in most situations, human not a bot.</p><p><b>Firstly</b>, to help customers more accurately identify the traffic, we are adding Cloudflare JavaScript Challenge, and Google reCAPTCHA (Challenge) mitigation actions to the UI and API for Pro and Business plans. The existing Block and Simulate actions still exist. As a reminder, to test any rule, deploying in Simulate means that you will not be charged for any requests. This is a great way to test your new rules to make sure they have been configured correctly.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6wD9AHZXCrufghkJPOcrDR/912893206456995a98bf003226cfac6e/Screen-Shot-2018-05-21-at-10.36.39-AM.png" />
            
            </figure><p><b>Secondly</b>, we’re making Rate Limiting more dynamically scalable. A new feature has been added which allows Rate Limiting to count on Origin Response Headers for Business and Enterprise customers. The way this feature works is by matching attributes which are returned by the Origin to Cloudflare.</p>
    <div>
      <h3>The new capabilities - in action!</h3>
      <a href="#the-new-capabilities-in-action">
        
      </a>
    </div>
    <p>One of the things that really drives our innovation is solving the real problems we hear from customers every day. With that, we wanted to provide some real world examples of these new capabilities in action.</p><p>Each of the use cases have Basic and Advanced implementation options. After some testing, we found that tiering rate limits is an extremely effective solution against repeat offenders.</p><p><b>Credential Stuffing Protection</b> for Login Pages and APIs. The best way to build applications is to utilise the standardized Status Codes. For example, if I fail to authenticate against an endpoint or a website, I should receive a “401” or “403”. Generally speaking a user to a website will often get their password wrong three times before selecting the “I forgot my password” option. Most Credential Stuff bots will try thousands of times cycling through many usernames and password combinations to see what works.</p><p>Here are some example rate limits which you can configure to protect your application from credential stuffing.</p><p><b>Basic</b>:Cloudflare offers a “Protect My Login” feature out the box. Enter the URL for your login page and Cloudflare will create a rule such that clients that attempt to log in more than 5 times in 5 minutes will be blocked for 15 minutes.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4FhmlH6WeVwOhm3pyHjo9w/b5647d44c77165d7da31d69422c322a6/Screen-Shot-2018-05-21-at-10.36.47-AM.png" />
            
            </figure><p>With the new Challenge capabilities of Rate Limiting, you can customize the response parameters for log in to more closely match the behavior pattern for bots you see on your site through a custom-built rule.</p><p>Logging in four times in one minute is hard - I type fast, but couldn’t even do this. If I’m seeing this pattern in my logs, it is likely a bot. I can now create a Rate Limiting rule based on the following criteria:</p><table><tr><td><p><b>RuleID</b></p></td><td><p><b>URL</b></p></td><td><p><b>Count</b></p></td><td><p><b>Timeframe</b></p></td><td><p><b>Matching Criteria</b></p></td><td><p><b>Action</b></p></td></tr><tr><td><p>1</p></td><td><p>/login</p></td><td><p>4</p></td><td><p>1 minute</p></td><td><p>Method: POST
Status Code: 401,403</p></td><td><p>Challenge</p></td></tr></table><p>With this new rule, if someone tries to log in four times within a minute, they will be thrown a challenge. My regular human users will likely never hit it, but if they do - the challenge insures they can still access the site.</p><p><b>Advanced</b>:
And sometimes bots are just super persistent in their attacks. We can tier rules together to tackle repeat offenders. For example, instead of creating just a single rule, we can create a series of rules which can be tiered to protect against persistent threats:</p><table><tr><td><p><b>RuleID</b></p></td><td><p><b>URL</b></p></td><td><p><b>Count</b></p></td><td><p><b>Timeframe</b></p></td><td><p><b>Matching Criteria</b></p></td><td><p><b>Action</b></p></td></tr><tr><td><p>1</p></td><td><p>/login</p></td><td><p>4</p></td><td><p>1 minute</p></td><td><p>Method: POST
Status Code: 401,403</p></td><td><p>JavaScript Challenge</p></td></tr><tr><td><p>2</p></td><td><p>/login</p></td><td><p>10</p></td><td><p>5 minutes</p></td><td><p>Method: POST
Status Code: 401,403</p></td><td><p>Challenge</p></td></tr><tr><td><p>3</p></td><td><p>/login</p></td><td><p>20</p></td><td><p>1 hour</p></td><td><p>Method: POST
Status Code: 401,403</p></td><td><p>Block for 1 day</p></td></tr></table><p>With this type of tiering, any genuine users that are just having a hard time remembering their login details whilst also being extremely fast typers will not be fully blocked. Instead, they will first be given out automated JavaScript challenge followed by a traditional CAPTCHA if they hit the next limit. This is a much more user-friendly approach while still securing your login endpoints.</p>
    <div>
      <h4>Time-based Firewall</h4>
      <a href="#time-based-firewall">
        
      </a>
    </div>
    <p>Our IP Firewall is a powerful feature to block problematic IP addresses from accessing your app. Particularly this is related to repeated abuse, or based on IP Reputation or Threat Intelligence feeds that are integrated at the origin level.</p><p>While the IP firewall is powerful, maintaining and managing a list of IP addresses which are currently being blocked can be cumbersome. It becomes more complicated if you want to allow blocked IP addresses to “age out” if bad behavior stops after a period of time. This often requires authoring and managing a script and multiple API calls to Cloudflare.</p><p>The new Rate Limiting Origin Headers feature makes this all so much easier. You can now configure your origin to respond with a Header to trigger a Rate-Limit. To make this happen, we need to generate a Header at the Origin, which is then added to the response to Cloudflare. As we are matching on a static header, we can set a severity level based on the content of the Header. For example, if it was a repeat offender, you could respond with High as the Header value, which could Block for a longer period.</p><p>Create a Rate Limiting rule based on the following criteria:</p><table><tr><td><p><b>RuleID</b></p></td><td><p><b>URL</b></p></td><td><p><b>Count</b></p></td><td><p><b>Timeframe</b></p></td><td><p><b>Matching Criteria</b></p></td><td><p><b>Action</b></p></td></tr><tr><td><p>1</p></td><td><p>*</p></td><td><p>1</p></td><td><p>1 second</p></td><td><p>Method: _ALL_
Header: X-CF-Block = low</p></td><td><p>Block for 5 minutes</p></td></tr><tr><td><p>2</p></td><td><p>*</p></td><td><p>1</p></td><td><p>1 second</p></td><td><p>Method: _ALL_
Header: X-CF-Block = medium</p></td><td><p>Block for 15 minutes</p></td></tr><tr><td><p>3</p></td><td><p>*</p></td><td><p>1</p></td><td><p>1 second</p></td><td><p>Method: _ALL_
Header: X-CF-Block = high</p></td><td><p>Block for 60 minutes</p></td></tr></table><p>Once that Rate-Limit has been created, Cloudflare’s Rate-Limiting will then kick-in immediately when that Header is received.</p>
    <div>
      <h4>Enumeration Attacks</h4>
      <a href="#enumeration-attacks">
        
      </a>
    </div>
    <p>Enumeration attacks are proving to be increasingly popular and pesky to mitigate. With enumeration attacks, attackers identify an expensive operation in your app and hammer at it to tie up resources and slow or crash your app. For example, an app that offers the ability to look up a user profile requires a database lookup to validate whether the user exists. In an enumeration attack, attackers will send a random set of characters to that endpoint in quick succession, causing the database to ground to a halt.</p><p>Rate Limiting to the rescue!</p><p>One of our customers was hit with a huge enumeration attack on their platform earlier this year, where the aggressors were trying to do exactly what we described above, in an attempt to overload their database platform. Their Rate Limiting configuration blocked over 100,000,000 bad requests during the 6-hour attack.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1pwVBMpUcd368swYqwG5BY/497e8e574ff8d71d03ce7f44ed46c65c/Screen-Shot-2018-05-21-at-10.36.57-AM.png" />
            
            </figure><p>When a query is sent to the app, and the user is not found, the app serves a 404 (page not found). A very basic approach is to set a rate limit for 404s. If a user crosses a threshold of 404’s in a period of time, set the app to challenge the user to prove themselves to be a real person.</p><table><tr><td><p><b>RuleID</b></p></td><td><p><b>URL</b></p></td><td><p><b>Count</b></p></td><td><p><b>Timeframe</b></p></td><td><p><b>Matching Criteria</b></p></td><td><p><b>Action</b></p></td></tr><tr><td><p>1</p></td><td><p>*</p></td><td><p>10</p></td><td><p>1 minute</p></td><td><p>Method: GET
Status Code: 404</p></td><td><p>Challenge</p></td></tr></table><p>To catch repeat offenders, you can tier the tier Rate Limits:</p><table><tr><td><p><b>RuleID</b></p></td><td><p><b>URL</b></p></td><td><p><b>Count</b></p></td><td><p><b>Timeframe</b></p></td><td><p><b>Matching Criteria</b></p></td><td><p><b>Action</b></p></td></tr><tr><td><p>1</p></td><td><p>/public/profile*</p></td><td><p>10</p></td><td><p>1 minute</p></td><td><p>Method: GET
Status Code: 404</p></td><td><p>JavaScript Challenge</p></td></tr><tr><td><p>2</p></td><td><p>/public/profile*</p></td><td><p>25</p></td><td><p>1 minute</p></td><td><p>Method: GET
Status Code: 200</p></td><td><p>Challenge</p></td></tr><tr><td><p>3</p></td><td><p>/public/profile*</p></td><td><p>50</p></td><td><p>10 minutes</p></td><td><p>Method: GET
Status Code: 200, 404</p></td><td><p>Block for 4 hours</p></td></tr></table><p>With this type of tiered defense in place, it means that you can “caution” an offender with a JavaScript challenge or Challenge (Google Captcha), and then “block” them if they continue.</p>
    <div>
      <h4>Content Scraping</h4>
      <a href="#content-scraping">
        
      </a>
    </div>
    <p>Increasingly, content owners are wrestling with content scraping - malicious bots copying copyrighted images or assets and redistributing or reusing them. For example, we work with an <a href="https://www.cloudflare.com/ecommerce/">eCommerce store</a> that uses copyrighted images and their images are appearing elsewhere on the web without their consent. Rate Limiting can help!</p><p>In their app, each page displays 4 copyrighted images, 1 which is actual size, and 3 which are thumbnails. By looking at logs and user patterns, they determined that most users, at a stretch, would never view more than 10–15 products in a minute, which would equate to 40–60 loads from the images store.</p><p>They chose to tier their Rate Limiting rules to prevent end users from getting unnecessarily blocked when they were browsing heavily. To <a href="https://www.cloudflare.com/learning/ai/how-to-prevent-web-scraping/">block malicious attempts at content scraping</a> can be quite simple, however it does require some forward planning. Placing the rate limit on the right URL is key to insure you are placing the rule on exactly what you are trying to protect and not the broader content. Here’s an example set of rate limits this customer set to protect their images:</p><table><tr><td><p><b>RuleID</b></p></td><td><p><b>URL</b></p></td><td><p><b>Count</b></p></td><td><p><b>Timeframe</b></p></td><td><p><b>Matching Criteria</b></p></td><td><p><b>Action</b></p></td></tr><tr><td><p>1</p></td><td><p>/img/thumbs/*</p></td><td><p>10</p></td><td><p>1 minute</p></td><td><p>Method: GET
Status Code: 404</p></td><td><p>Challenge</p></td></tr><tr><td><p>2</p></td><td><p>/img/thumbs/*</p></td><td><p>25</p></td><td><p>1 minute</p></td><td><p>Method: GET
Status Code: 200</p></td><td><p>Challenge</p></td></tr><tr><td><p>3</p></td><td><p>/img/*</p></td><td><p>75</p></td><td><p>1 minute</p></td><td><p>Method: GET
Status Code: 200</p></td><td><p>Block for 4 hours</p></td></tr><tr><td><p>4</p></td><td><p>/img/*</p></td><td><p>5</p></td><td><p>1 minute</p></td><td><p>Method: GET
Status Code: 403, 404</p></td><td><p>Challenge</p></td></tr></table><p>As we can see here, rules 1 and 2 are counting based on the number of requests to each endpoint. Rule 3 is counting based on all hits to the image store, and if it gets above 75 requests, the user will be blocked for 4 hours. Finally, to avoid any enumeration or bots guessing image names and numbers, we are counting on 404 and 403s and challenging if we see unusual spikes.</p>
    <div>
      <h3>One more thing ... more rules, <i>totally rules!</i></h3>
      <a href="#one-more-thing-more-rules-totally-rules">
        
      </a>
    </div>
    <p>We want to ensure you have the rules you need to secure your app. To do that, we are increasing the number of available rules for Pro and Business, for no additional charge.</p><ul><li><p>Pro plans increase from 3 to 10 rules</p></li><li><p>Business plans increase from 3 to 15 rules</p></li></ul><p>As always, Cloudflare only charges for good traffic - requests that are allowed through Rate Limiting, not blocked. For more information click <a href="https://support.cloudflare.com/hc/en-us/articles/115000272247-Billing-for-Cloudflare-Rate-Limiting">here</a>.</p><p>The Rate-Limiting feature can be enabled within the Firewall tab on the Dashboard, or by visiting: <a href="https://www.cloudflare.com/a/firewall/">cloudflare.com/a/firewall</a></p> ]]></content:encoded>
            <category><![CDATA[Rate Limiting]]></category>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Speed & Reliability]]></category>
            <category><![CDATA[Attacks]]></category>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[Mitigation]]></category>
            <guid isPermaLink="false">l8ac1fSE0Q5tV7W36HzV1</guid>
            <dc:creator>Alex Cruz Farmer</dc:creator>
        </item>
        <item>
            <title><![CDATA[Memcrashed - Major amplification attacks from UDP port 11211]]></title>
            <link>https://blog.cloudflare.com/memcrashed-major-amplification-attacks-from-port-11211/</link>
            <pubDate>Tue, 27 Feb 2018 14:38:35 GMT</pubDate>
            <description><![CDATA[ Over last couple of days we've seen a big increase in an obscure amplification attack vector - using the memcached protocol, coming from UDP port 11211. In the past, we have talked a lot about amplification attacks happening on the internet.  ]]></description>
            <content:encoded><![CDATA[ <p>Over last couple of days we've seen a big increase in an obscure amplification attack vector - using the <a href="https://github.com/memcached/memcached/blob/master/doc/protocol.txt">memcached protocol</a>, coming from UDP port 11211.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3Mh7uCbRBwKf3LjeioLBFy/34ac14cf2982454843c1cac88d77c6db/3829936641_f112ed1665_b.jpg" />
            
            </figure><p><a href="https://creativecommons.org/licenses/by-sa/2.0/">CC BY-SA 2.0</a> <a href="https://www.flickr.com/photos/trawin/3829936641/">image</a> by <a href="https://www.flickr.com/photos/trawin/">David Trawin</a>In the past, we have talked a lot about amplification attacks happening on the internet. Our most recent two blog posts on this subject were:</p><ul><li><p><a href="/ssdp-100gbps/">SSDP amplifications crossing 100Gbps</a>. Funnily enough, since then we were a target of an 196Gbps SSDP attack.</p></li><li><p><a href="/reflections-on-reflections/">General statistics about various amplification attacks we see</a>.</p></li></ul><p>The general idea behind all amplification attacks is the same. <a href="https://idea.popcount.org/2016-09-20-strange-loop---ip-spoofing/">An IP-spoofing capable attacker</a> sends forged requests to a vulnerable UDP server. The UDP server, not knowing the request is forged, politely prepares the response. The problem happens when thousands of responses are delivered to an unsuspecting target host, overwhelming its resources - most typically the network itself.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/16dgrwlKcAnFbOkNxSYolX/b1b7cff8a4fb93e7c0ce826bbed621a8/spoofing-1.png" />
            
            </figure><p>Amplification attacks are effective, because often the response packets are much larger than the request packets. A carefully prepared technique allows an attacker with limited IP spoofing capacity (such as 1Gbps) to launch very large attacks (reaching 100s Gbps) "amplifying" the attacker's bandwidth.</p>
    <div>
      <h3>Memcrashed</h3>
      <a href="#memcrashed">
        
      </a>
    </div>
    <p>Obscure amplification attacks happen all the time. We often see "chargen" or "call of duty" packets hitting our servers.</p><p>A discovery of a new amplification vector though, allowing very great amplification, happens rarely. This new memcached UDP DDoS is definitely in this category.</p><p>The <a href="https://ddosmon.net/insight/">DDosMon from Qihoo 360</a> monitors amplification attack vectors and this chart shows recent memcached/11211 attacks:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1xImt2QRqij9hJLNe8uJm7/ea08387f1f5246b4f3c8866d2666ad3f/memcached-ddosmon.png" />
            
            </figure><p>The number of memcached attacks was relatively flat, until it started spiking just a couple days ago. Our charts also confirm this, here are attacks in packets per second over the last four days:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5VD1Yd7gkKpd68RBfW2xYu/44afafe112b0d94774ddd44621e80a72/memcached-pps.png" />
            
            </figure><p>While the packets per second count is not that impressive, the bandwidth generated is:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1THC5a2xHEdnadw4AOEuoB/c169b23cbf7b4a735478a77e7f0a760a/memcached-gpb.png" />
            
            </figure><p>At peak we've seen 260Gbps of inbound UDP memcached traffic. This is massive for a new amplification vector. But the numbers don't lie. It's possible because all the reflected packets are very large. This is how it looks in tcpdump:</p>
            <pre><code>$ tcpdump -n -t -r memcrashed.pcap udp and port 11211 -c 10
IP 87.98.205.10.11211 &gt; 104.28.1.1.1635: UDP, length 13
IP 87.98.244.20.11211 &gt; 104.28.1.1.41281: UDP, length 1400
IP 87.98.244.20.11211 &gt; 104.28.1.1.41281: UDP, length 1400
IP 188.138.125.254.11211 &gt; 104.28.1.1.41281: UDP, length 1400
IP 188.138.125.254.11211 &gt; 104.28.1.1.41281: UDP, length 1400
IP 188.138.125.254.11211 &gt; 104.28.1.1.41281: UDP, length 1400
IP 188.138.125.254.11211 &gt; 104.28.1.1.41281: UDP, length 1400
IP 188.138.125.254.11211 &gt; 104.28.1.1.41281: UDP, length 1400
IP 5.196.85.159.11211 &gt; 104.28.1.1.1635: UDP, length 1400
IP 46.31.44.199.11211 &gt; 104.28.1.1.6358: UDP, length 13</code></pre>
            <p>The majority of packets are 1400 bytes in size. Doing the math 23Mpps x 1400 bytes gives 257Gbps of bandwidth, exactly what the chart shows.</p>
    <div>
      <h3>Memcached does UDP?</h3>
      <a href="#memcached-does-udp">
        
      </a>
    </div>
    <p>I was surprised to learn that memcached does UDP, but there you go! The <a href="https://github.com/memcached/memcached/blob/master/doc/protocol.txt">protocol specification</a> shows that it's one of <i>the best protocols to use for amplification ever</i>! There are absolutely zero checks, and the data <i>WILL</i> be delivered to the client, with blazing speed! Furthermore, the request can be tiny and the response huge (up to 1MB).</p><p>Launching such an attack is easy. First the attacker implants a large payload on an exposed memcached server. Then, the attacker spoofs the "get" request message with target Source IP.</p><p>Synthetic run with Tcpdump shows the traffic:</p>
            <pre><code>$ sudo tcpdump -ni eth0 port 11211 -t
IP 172.16.170.135.39396 &gt; 192.168.2.1.11211: UDP, length 15
IP 192.168.2.1.11211 &gt; 172.16.170.135.39396: UDP, length 1400
IP 192.168.2.1.11211 &gt; 172.16.170.135.39396: UDP, length 1400
...(repeated hundreds times)...</code></pre>
            <p>15 bytes of request triggered 134KB of response. This is amplification factor of 10,000x! In practice we've seen a 15 byte request result in a 750kB response (that's a 51,200x amplification).</p>
    <div>
      <h3>Source IPs</h3>
      <a href="#source-ips">
        
      </a>
    </div>
    <p>The vulnerable memcached servers are all around the globe, with higher concentration in North America and Europe. Here is a map of the source IPs we've seen in each of our 120+ points of presence:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/70pC9m63FE7QI31oRMluQW/b75920090ae2490ccfc1844dde812ed1/memcached-map2.png" />
            
            </figure><p>Interestingly our datacenters in EWR, HAM and HKG see disproportionally large numbers of attacking IPs. This is because most of the vulnerable servers are located in major hosting providers. The AS numbers of the IPs that we've seen:</p>
            <pre><code>┌─ips─┬─srcASN──┬─ASName───────────────────────────────────────┐
│ 578 │ AS16276 │ OVH                                          │
│ 468 │ AS14061 │ DIGITALOCEAN-ASN - DigitalOcean, LLC         │
│ 231 │ AS7684  │ SAKURA-A SAKURA Internet Inc.                │
│ 199 │ AS9370  │ SAKURA-B SAKURA Internet Inc.                │
│ 165 │ AS12876 │ AS12876                                      │
│ 119 │ AS9371  │ SAKURA-C SAKURA Internet Inc.                │
│ 104 │ AS16509 │ AMAZON-02 - Amazon.com, Inc.                 │
│ 102 │ AS24940 │ HETZNER-AS                                   │
│  81 │ AS26496 │ AS-26496-GO-DADDY-COM-LLC - GoDaddy.com, LLC │
│  74 │ AS36351 │ SOFTLAYER - SoftLayer Technologies Inc.      │
│  65 │ AS20473 │ AS-CHOOPA - Choopa, LLC                      │
│  49 │ AS49981 │ WORLDSTREAM                                  │
│  48 │ AS51167 │ CONTABO                                      │
│  48 │ AS33070 │ RMH-14 - Rackspace Hosting                   │
│  45 │ AS19994 │ RACKSPACE - Rackspace Hosting                │
│  44 │ AS60781 │ LEASEWEB-NL-AMS-01 Netherlands               │
│  42 │ AS45899 │ VNPT-AS-VN VNPT Corp                         │
│  41 │ AS2510  │ INFOWEB FUJITSU LIMITED                      │
│  40 │ AS7506  │ INTERQ GMO Internet,Inc                      │
│  35 │ AS62567 │ DIGITALOCEAN-ASN-NY2 - DigitalOcean, LLC     │
│  31 │ AS8100  │ ASN-QUADRANET-GLOBAL - QuadraNet, Inc        │
│  30 │ AS14618 │ AMAZON-AES - Amazon.com, Inc.                │
│  30 │ AS31034 │ ARUBA-ASN                                    │
└─────┴─────────┴──────────────────────────────────────────────┘</code></pre>
            <p>Most of the memcached servers we've seen were coming from AS16276 - OVH, AS14061 - Digital Ocean and AS7684 - Sakura.</p><p>In total we've seen only 5,729 unique source IPs of memcached servers. We're expecting to see much larger attacks in future, as <a href="https://www.shodan.io">Shodan</a> reports 88,000 open memcached servers:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3M4vOFeXyjG82X94Qb5OdW/c1c32822cad31a0825bae7adf06a50ed/memcached-shodan.png" />
            
            </figure>
    <div>
      <h3>Let's fix it up</h3>
      <a href="#lets-fix-it-up">
        
      </a>
    </div>
    <p>It's necessary to fix this and prevent further attacks. Here is a list of things that should be done.</p>
    <div>
      <h4>Memcached Users</h4>
      <a href="#memcached-users">
        
      </a>
    </div>
    <p>If you are using memcached, please disable UDP support if you are not using it. On memcached startup you can specify <code>--listen 127.0.0.1</code> to listen only to localhost and <code>-U 0</code> to disable UDP completely. <i>By default memcached listens on INADDR_ANY and runs with UDP support ENABLED</i>. Documentation:</p><ul><li><p><a href="https://github.com/memcached/memcached/wiki/ConfiguringServer#udp">https://github.com/memcached/memcached/wiki/ConfiguringServer#udp</a></p></li></ul><p>You can easily test if your server is vulnerable by running:</p>
            <pre><code>$ echo -en "\x00\x00\x00\x00\x00\x01\x00\x00stats\r\n" | nc -q1 -u 127.0.0.1 11211
STAT pid 21357
STAT uptime 41557034
STAT time 1519734962
...</code></pre>
            <p>If you see non-empty response (like the one above), your server is vulnerable.</p>
    <div>
      <h4>System administrators</h4>
      <a href="#system-administrators">
        
      </a>
    </div>
    <p>Please ensure that your memcached servers are firewalled from the internet! To test whether they can be accessed using UDP I recommend the <code>nc</code> example above, to verify if TCP is closed run <code>nmap</code>:</p>
            <pre><code>$ nmap TARGET -p 11211 -sU -sS --script memcached-info
Starting Nmap 7.30 ( https://nmap.org ) at 2018-02-27 12:44 UTC
Nmap scan report for xxxx
Host is up (0.011s latency).
PORT      STATE         SERVICE
11211/tcp open          memcache
| memcached-info:
|   Process ID           21357
|   Uptime               41557524 seconds
|   Server time          2018-02-27T12:44:12
|   Architecture         64 bit
|   Used CPU (user)      36235.480390
|   Used CPU (system)    285883.194512
|   Current connections  11
|   Total connections    107986559
|   Maximum connections  1024
|   TCP Port             11211
|   UDP Port             11211
|_  Authentication       no
11211/udp open|filtered memcache</code></pre>
            
    <div>
      <h4>Internet Service Providers</h4>
      <a href="#internet-service-providers">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7Dmq5baGJDRlwdmxqz9hHg/b9a8188adb7776dd16a7689d8936baa8/memcached-reflector.png" />
            
            </figure><p>In order to defeat such attacks in future, we need to fix vulnerable protocols and also IP spoofing. As long as IP spoofing is permissible on the internet, we'll be in trouble.</p><p>Help us out by tracking who is behind these attacks. We must know not who has problematic memcached servers, but <i>who sent them queries in the first place</i>. We can't do this without your help!</p>
    <div>
      <h4>Developers</h4>
      <a href="#developers">
        
      </a>
    </div>
    <p>Please please please: Stop using UDP. If you must, please don't enable it by default. If you do not know what an amplification attack is I hereby forbid you from ever typing <code>SOCK_DGRAM</code> into your editor.</p><p>We've been down this road so many times. DNS, NTP, Chargen, SSDP and now memcached. If you use UDP, you must always respond with strictly a <i>smaller</i> packet size then the request. Otherwise your protocol will be abused. Also remember that people do forget to set up a firewall. Be a nice citizen. Don't invent a UDP-based protocol that lacks authentication of any kind.</p>
    <div>
      <h3>That's all</h3>
      <a href="#thats-all">
        
      </a>
    </div>
    <p>It's anyone's guess how large the memcached attacks will become before we clean the vulnerable servers up. There were already rumors of 0.5Tbps amplifications in the last few days, and this is just a start.</p><p>Finally, you are OK if you are a Cloudflare customer. Cloudflare's Anycast architecture works well to distribute the load in case of large amplification attacks, and unless your origin IP is exposed, you are safe behind Cloudflare.</p>
    <div>
      <h3>Prologue</h3>
      <a href="#prologue">
        
      </a>
    </div>
    <p>A comment (below) points out that the possibility of using memcached for DDoS was discussed in a <a href="http://powerofcommunity.net/poc2017/shengbao.pdf">2017 presentation</a>.</p><p><b>Update</b>We received a word from Digital Ocean, OVH, Linode and Amazon that they tackled the memcached problem, their networks should not be a vector in future attacks. Hurray!</p><hr /><p><i>Dealing with DDoS attacks sound interesting? Join our </i><a href="https://boards.greenhouse.io/cloudflare/jobs/589572"><i>world famous team</i></a><i> in London, Austin, San Francisco and our elite office in Warsaw, Poland</i>.</p> ]]></content:encoded>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Mitigation]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Attacks]]></category>
            <category><![CDATA[Vulnerabilities]]></category>
            <guid isPermaLink="false">2fuIMiXv55dtHVPjBPDYjH</guid>
            <dc:creator>Marek Majkowski</dc:creator>
        </item>
        <item>
            <title><![CDATA[On the Leading Edge - Cloudflare named a leader in The Forrester Wave: DDoS Mitigation Solutions]]></title>
            <link>https://blog.cloudflare.com/forrester-wave-ddos-mitigation-2017/</link>
            <pubDate>Thu, 07 Dec 2017 20:44:39 GMT</pubDate>
            <description><![CDATA[ Cloudflare has been recognized as a leader in the “Forrester WaveTM: DDoS Mitigation Solutions, Q4 2017.” ]]></description>
            <content:encoded><![CDATA[ 
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6aV35VB5CtFfVThX2VCKWs/6ca5bf5cf0057c588c9a302c58884e92/Forrester-3_4x_Blog.png" />
            
            </figure><p>Cloudflare has been recognized as a leader in the “Forrester Wave<sup>TM</sup>: DDoS Mitigation Solutions, Q4 2017.”</p><p>The <a href="/the-new-ddos-landscape/">DDoS landscape continues to evolve</a>. The increase in sophistication, frequency, and range of targets of DDoS attacks has placed greater demands on DDoS providers, many of which were evaluated in the report.</p><p>This year, Cloudflare received the highest scores possible in 15 criteria, including:</p><ul><li><p>Length of Implementation</p></li><li><p>Layers 3 and 4 Attacks Mitigation</p></li><li><p>DNS Attack Mitigation</p></li><li><p>IoT Botnets</p></li><li><p>Multi-Vector Attacks</p></li><li><p>Filtering Deployment</p></li><li><p>Secure Socket Layer Investigation</p></li><li><p>Mitigation Capacity</p></li><li><p>Pricing Model</p></li></ul><p>We believe that Cloudflare’s position as a leader in the report stems from the following:</p><ul><li><p>An architecture designed to address high-volume attacks. <a href="/how-cloudflares-architecture-allows-us-to-scale-to-stop-the-largest-attacks/">This post written in October 2016</a> provides some insight into how Cloudflare’s architecture scales to meet the most advanced DDoS attacks differently than legacy scrubbing centers.</p></li><li><p>In September 2017, due to the size and effectiveness of our network, we announced the elimination of “surge pricing” commonly found in other DDoS vendors by <a href="/unmetered-mitigation/">offering unmetered mitigation</a>. Regardless of what Cloudflare plan a customer is on—Free, Pro, Business, or Enterprise—we will never terminate a customer or charge more based on the size of an attack.</p></li><li><p>Because we protect over 7 million Internet properties, we have a unique view into the types of attacks launched across the Internet, especially harder-to-deflect Layer 7 application attacks. This allows us to reduce the amount of manual intervention and <a href="/meet-gatebot-a-bot-that-allows-us-to-sleep/">use automated mitigations to more quickly detect and block attacks</a>.</p></li><li><p>Our DDoS mitigation solution helps protect customers by integrating with not only a stack of other security features, such as SSL and WAF, but also with a full suite of performance features. With a highly scalable network of over 118 data centers, Cloudflare can both accelerate legitimate traffic and block malicious DDoS traffic.</p></li></ul><p>This combination of scale, ease-of-use through automatic mitigations, and integration with performance solutions, continues to advance our mission to help build a better Internet.</p><p>At Cloudflare, our mission is to help build a better internet - one that is performant, secure and reliable for all. We do this through a combination of scale, ease-of use, and data-driven insights that enable us to deliver automatic mitigation. It is because of this focus and these types of innovation that we were able to offer unmetered DDoS mitigation at no additional cost to all of our customers this year. We are honored to be recognized as a leader in the Forrester Wave<sup>TM</sup>: DDoS Mitigation Solutions, Q4 2017 report.</p><p>To check out the full report, download your complimentary copy <a href="https://www.cloudflare.com/forrester-wave-ddos-mitigation-2017?utm_medium=organic&amp;utm_source=blog&amp;utm_campaign=201712-forrester-wave">here</a>.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5ayjLZL9xaqzCDkou92koX/74def2b83778a9e809f3e7fde6c613a0/2017Q4_DDoS-Mitigation-Solutions_137209.png" />
            
            </figure><p></p> ]]></content:encoded>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[Awards]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Mitigation]]></category>
            <category><![CDATA[Security]]></category>
            <guid isPermaLink="false">3OXI4i2ESESzGU1TttwPuI</guid>
            <dc:creator>Jen Taylor</dc:creator>
        </item>
        <item>
            <title><![CDATA[The New DDoS Landscape]]></title>
            <link>https://blog.cloudflare.com/the-new-ddos-landscape/</link>
            <pubDate>Thu, 23 Nov 2017 03:28:00 GMT</pubDate>
            <description><![CDATA[ News outlets and blogs will frequently compare DDoS attacks by the volume of traffic that a victim receives. Surely this makes some sense, right? The greater the volume of traffic a victim receives, the harder to mitigate an attack - right?  ]]></description>
            <content:encoded><![CDATA[ <p>News outlets and blogs will frequently compare DDoS attacks by the volume of traffic that a victim receives. Surely this makes some sense, right? The greater the volume of traffic a victim receives, the harder to mitigate an attack - right?</p><p>At least, this is how things used to work. An attacker would gain capacity and then use that capacity to launch an attack. With enough capacity, an attack would overwhelm the victim's network hardware with junk traffic such that they can no longer serve legitimate requests. If your web traffic is served by a server with a 100 Gbps port and someone sends you 200 Gbps, your network will be saturated and the website will be unavailable.</p><p>Recently, this dynamic has shifted as attackers have gotten far more sophisticated. The practical realities of the modern Internet have increased the amount of effort required to clog up the network capacity of a DDoS victim - attackers have noticed this and are now choosing to perform attacks higher up the network stack.</p><p>In recent months, Cloudflare has seen a dramatic reduction in simple attempts to flood our network with junk traffic. Whilst we continue to see large network level attacks, in excess of 300 and 400 Gbps, network level attacks in general have become far less common (the largest recent attack was just over 0.5 Tbps). This has been especially true since the end of September when we made official a policy that <a href="/unmetered-mitigation/">would not remove any customers from our network merely for receiving a DDoS attack that's too big</a>, including those on our free plan.</p><p>Far from attackers simply closing shop, we see a trend whereby attackers are moving to more advanced application-layer attack strategies. This trend is not only seen in metrics from our automated attack mitigation systems, but has also been the experience of our frontline customer support engineers. Whilst we continue to see very large network level attacks, note that they are occurring less frequently since the introduction of Unmetered Mitigation:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4j6SZYixPDOxuWY5hwbHXE/36e14812f66afffc9c63960a2c6e1244/syn.png" />
            
            </figure><p>To understand how the landscape has made such a dramatic shift, we must first understand how DDoS attacks are performed.</p>
    <div>
      <h3>Performing a DDoS</h3>
      <a href="#performing-a-ddos">
        
      </a>
    </div>
    <blockquote><p>From presentation by <a href="https://twitter.com/IcyApril?ref_src=twsrc%5Etfw">@IcyApril</a> - First thing you absolutely need for a successful DDoS - is a cool costume. <a href="https://t.co/WIC0LjF4ka">pic.twitter.com/WIC0LjF4ka</a></p><p>— majek04 (@majek04) <a href="https://twitter.com/majek04/status/933332376159162370?ref_src=twsrc%5Etfw">November 22, 2017</a></p></blockquote><p>The first thing you need before you can carry out a DDoS Attack is capacity. You need the network resources to be able to overwhelm your victim.</p><p>To build up capacity, attackers have a few mechanisms at their disposal; three such examples are Botnets, IoT Devices and DNS Amplification:</p>
    <div>
      <h4>Botnets</h4>
      <a href="#botnets">
        
      </a>
    </div>
    <p>Computer viruses are deployed for multiple reasons, for example; they can harvest private information from users or blackmail users into paying them money to get their precious files back. Another utility of computer viruses is building capacity to perform DDoS Attacks.</p><p>A Botnet is a network of infected computers that are centrally controlled by an attacker; these zombie computers can then be used to send spam emails or perform DDoS attacks.</p><p>Consumers have access to faster Internet than ever before. In November 2016, the average UK broadband upload speed reached 4.3Mbps - this means a Botnet which has infected a little under 2,400 computers can launch an attack of around 10Gbps. Such capacity is plenty enough to saturate the end networks that power most websites online.</p><p>On August 17th, 2017, multiple networks online were subject to significant attacks from a botnet known as WireX. Researchers from a variety of organisations, including from Akamai, Cloudflare, Flashpoint, Google, Oracle Dyn, RiskIQ, Team Cymru, and other organizations cooperated to combat this botnet - eventually leading to hundreds of Android apps being removed and a process started to remove the malware-ridden apps from all devices.</p>
    <div>
      <h4>IoT Devices</h4>
      <a href="#iot-devices">
        
      </a>
    </div>
    <p>More and more of our everyday appliances are being embedded with Internet connectivity. Like other types of technology, they can be taken over with malware and controlled to launch large-scale DDoS attacks.</p><p>Towards the end of last year, we began to see Internet-connected cameras start to launch <a href="/say-cheese-a-snapshot-of-the-massive-ddos-attacks-coming-from-iot-cameras/">large DDoS attacks</a>. The use of video cameras was advantageous to attackers in the sense they needed to be connected to networks with enough bandwidth to be capable of streaming video.</p><p>Mirai was one such botnet which targeted Internet-connected cameras and Internet routers. It would start by logging into the web dashboard of a device using a table of 60 default usernames and passwords, then installing malware on the device.</p><p>Where users set passwords instead of them merely being the default values, other pieces of malware can use Dictionary Attacks to repeatedly guess simple user-configured passwords, using a list of common passwords like the one shown below. I have self-censored some of the passwords, apparently users can be in a fairly angry state-of-mind when setting passwords:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3lxX56Rn4vULNewXMTViZ8/f7e7739a0d93d0762854ca96c6802116/Screen-Shot-2017-11-24-at-16.22.59.png" />
            
            </figure><p>Passwords aside, back in May, I blogged specifically about some examples of security risks we are starting to see which are specific to IoT devices: <a href="/iot-security-anti-patterns/">IoT Security Anti-Patterns</a>.</p>
    <div>
      <h4>DNS Amplification</h4>
      <a href="#dns-amplification">
        
      </a>
    </div>
    <p>DNS is the phonebook of the Internet; in order to reach this site, your local computer used DNS to look up which IP Address would serve traffic for <code>blog.cloudflare.com</code>. I can perform this DNS queries from my command line using <code>dig A blog.cloudflare.com</code>:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6Hh5Ef8wCUFfsLgQFSTDnL/1c1a6f2b786c35624b8ac302552f0005/Screen-Shot-2017-11-24-at-16.37.47.png" />
            
            </figure><p>Firstly notice that the response is pretty big, it's certainly bigger than the question we asked.</p><p>DNS is built on a Transport Protocol called UDP, when using UDP it's easy to forge the requester of a query as UDP doesn't require a handshake before sending a response.</p><p>Due to these two factors, someone is able to make a DNS query on behalf of someone else. We can make a request for a relatively small DNS query, which will will then result in a long response being sent somewhere else.</p><p>Online there are open DNS resolvers that will take a request from anyone online and send the response to anyone else. In most cases we should not be exposing open DNS resolvers to the internet (most are open due to configuration mistakes). However, when intentionally exposing DNS resolvers online, security steps should be taken - StrongArm has <a href="https://strongarm.io/blog/secure-open-dns-resolver/">a primer on securing open DNS resolvers</a> on their blog.</p><p>Let's use a hypothetical to illustrate this point. Imagine that you wrote to a mail order retailer requesting a catalogue (if you still know what one is). You'd send a relatively short postcard with your contact information and your request - you'd then get back quite a big catalogue. Now imagine you did the same, but sent hundreds of these postcards and instead included the address of someone else. Assuming the retailer was obliging in sending such a vast amount of catalogues, your friend could wake up one day with their front door blocked with catalogues.</p><p>In 2013, we blogged about how one DNS Amplification attack we faced <a href="/the-ddos-that-almost-broke-the-internet/">almost broke the internet</a>; however, in recent times aside from one exceptionally high attack (occurring just after we launched <a href="/unmetered-mitigation/">Unmetered DDoS Mitigation</a>), DNS Amplification attacks have generally been a low proportion of attacks we see recently:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4zWEAcupocMXEFYGTN7qfk/9ef20d45969ecfb65ab4661beb390918/dns.png" />
            
            </figure><p>Whilst we're seeing fewer of these attacks, you can find a more detailed overview on our learning centre: <a href="https://www.cloudflare.com/learning/ddos/dns-amplification-ddos-attack/">DNS Amplification Attack</a></p>
    <div>
      <h3>DDoS Mitigation: The Old Priorities</h3>
      <a href="#ddos-mitigation-the-old-priorities">
        
      </a>
    </div>
    <p>Using the capacity an attacker has built up, they can send junk traffic to a web property. This is referred to as a Layer 3/4 attack. This kind of attack primarily seeks to block up the network capacity of the victim network.</p><p>Above all, mitigating these attacks requires capacity. If you get an attack of 600 Gbps and you only have 10 Gbps of capacity you either need to pay an intermediary network to filter traffic for you or have your network go offline due to the force of the attack.</p><p>As a network, Cloudflare works by passing a customer's traffic through our network; in doing so, we are able to apply performance optimisations and security filtering to the traffic we see. One such security filter is removing junk traffic associated with Layer 3/4 DDoS attacks.</p><p>Cloudflare's network was built when large scale DDoS Attacks were becoming a reality. Huge network capacity, spread out over the world in many different data centres makes it easier to absorb large attacks. We currently have over 15 Tbps and this figure is always growing fast.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/64a0Lhlr7C7Fj66wdHTi4K/c119b1c033db44582bda9643fe58cf34/network-map-gradient.png" />
            
            </figure><p>Preventing DDoS attacks needs a little more sophistication than just capacity though. While traditional Content Delivery Networks are built using Unicast technology, Cloudflare's network is built using an Anycast design.</p><p>In essence, this means that network traffic is routed to the nearest available Point-of-Presence and it is not possible for an attacker to override this routing behaviour - the routing is effectively performed using BGP, the routing protocol of the Internet.</p><p>Unicast networks will frequently use technology like DNS to steer traffic to close data centres. This routing can easily be overridden by an attacker, allowing them to force attack traffic to a single data centre. This is not possible with Cloudflare's Anycast network; meaning we maintain full control of how our traffic is routed (providing intermediary ISPs respect our routes). With this network design, have the ability to rapidly update routing decisions even against ISPs which ordinarily do not respect cache expiration times for DNS records (TTLs).</p><p>Cloudflare's network also maintains an Open Peering Policy; we are open to interconnecting our network with any other network without cost. This means we tend to eliminate intermediary networks across our network. When we are under attack, we usually have a very short network path from the attacker to us - this means there are no intermediary networks which can suffer collateral damage.</p>
    <div>
      <h3>The New Landscape</h3>
      <a href="#the-new-landscape">
        
      </a>
    </div>
    <p>I started this blog post with a chart which demonstrates the frequency of a type of network-layer attack known as a SYN Flood against the Cloudflare network. You'll notice how the largest attacks are further spaced out over the past few months:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6UvuxVBCEB5uFaqbDnN1XH/05d3fb315da365196f34578b57ea9cff/syn.png" />
            
            </figure><p>You can also see that this trend does not follow when compared to a graph of Application Layer DDoS attacks which we continue to see coming in:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/0o48AirpaxWXQkjV6lmcZ/39598752e41f9188e77a70c5edad67ca/layer7.png" />
            
            </figure><p>The chart above has an important caveat, an Application Layer attack is defined by what <b>we</b> determine an attack is. Application Layer (Layer 7) attacks are more indistinguishable from real traffic than Layer 3/4 attacks. The attacks effectively resemble normal web requests, instead of junk traffic.</p><p>Attackers can order their Botnets to perform attacks against websites using "Headless Browsers" which have no user interface. Such Headless Browsers work exactly like normal browsers, except that they are controlled programmatically instead of being controlled via a window on a user's screen.</p><p>Botnets can use Headless Browsers to effectively make HTTP requests that load and behave just like ordinary web requests. As this can be done programmatically, they can order bots to repeat these HTTP requests rapidly - effectively taking up the entire capacity of a website, taking it offline for ordinary visitors.</p><p>This is a non-trivial problem to solve. At Cloudflare, we have specific services like <a href="/meet-gatebot-a-bot-that-allows-us-to-sleep/">Gatebot</a> which identify DDoS attacks by picking up on anomalies in network traffic. We have tooling like <a href="https://support.cloudflare.com/hc/en-us/articles/200170076-What-does-I-m-Under-Attack-Mode-do-">"I'm Under Attack Mode"</a> to analyse traffic to ensure the visitor is human. This is, however, only part of the story.</p><p>A $20/month server running a resource intense e-commerce platform may not be able to cope with any more than a dozen concurrent HTTP requests before being unable to serve any more traffic.</p><p>An attack which can take down a small e-commerce site will likely not even be a drop in the ocean for Cloudflare's network, which sees around 10% of Internet requests online.</p><p>The chart below outlines DDoS attacks per day against Cloudflare customers; but it is important to bear in mind that this includes what <b>we</b> define as an attack. In recent times, Cloudflare has built specific products to help customers define what they think an attack looks like and how much traffic they feel they should cope with.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/jDOUHwa7eNwq42lzB65M8/dee1a55374ae1f37e7f60c85472662cc/cfddos.png" />
            
            </figure>
    <div>
      <h3>A Web Developers Guide to Defeating an Application Layer DDoS Attack</h3>
      <a href="#a-web-developers-guide-to-defeating-an-application-layer-ddos-attack">
        
      </a>
    </div>
    <p>One of the reasons why Application Layer DDoS attacks are so attractive is due to the uneven balance between the relative computational overhead required to for someone to request a web page, and the computational difficulty in serving one. Serving a dynamic website requires all kinds of operations; fetching information from a database, firing off API requests to separate services, rendering a page, writing log lines and potentially even pushing data down a message queue.</p><p>Fundamentally, there are two ways of dealing with this problem:</p><ul><li><p>making the balance between requester and server, less asymmetric by making it easier to serve web requests</p></li><li><p>limiting requests which are in such excess, they are blatantly abusive</p></li></ul><p>It remains critical you have a high-capacity DDoS mitigation network in front of your web application; one of the reasons why Application Layer attacks are increasingly attractive to attackers because networks have gotten good at mitigating volumetric attacks at the network layer.</p><p>Cloudflare has found that whilst performing Application Layer attacks, attackers will sometimes pick cryptographic ciphers that are the hardest for servers to calculate and are not usually accelerated in hardware. What this means is the attackers will try and consume more of your servers resources by actually using the fact you offer encrypted HTTPS connections against you. Having a proxy in front of your web traffic has the added benefit of ensuring that it has to establish a brand new secure connection to your origin web server - effectively meaning you don't have to worry about Presentation Layer attacks.</p><p>Additionally, offloading services which don't have custom application logic (like DNS) to managed providers can help ensure you have less surface area to worry about at the Application Layer.</p>
    <div>
      <h4>Aggressive Caching</h4>
      <a href="#aggressive-caching">
        
      </a>
    </div>
    <p>One of the ways to make it easier to serve web requests is to use some form of caching. There are multiple forms of caching; however, here I'm going to be talking about how you enable caching for HTTP requests.</p><p>Suppose you're using a CMS (Content Management System) to update your blog, the vast majority of visitors to your blog will see the identical page to every other visitor. It will only be the when an anonymous visitor logs in or leaves a comment that they will see a page that's dynamic and unlike every other page that's been rendered.</p><p>Despite the vast majority of HTTP requests to specific URLs being identical, your CMS has to regenerate the page for every single request as if it was brand new. Application Layer DDoS attacks exploit this to amplification to make their attacks more brutal.</p><p>Caching proxies like NGINX and services like Cloudflare allow you to specify that until a user has a browser cookie that de-anonymises them, content can be served from cache. Alongside performance benefits, these configuration changes can prevent the most crude Application Layer DDoS Attacks.</p><p>For further information on this, you can consult NGINX guide to caching or alternatively see my blog post on caching anonymous page views:</p><ul><li><p><a href="https://www.nginx.com/blog/nginx-caching-guide/">NGINX Caching Guide</a></p></li><li><p><a href="/caching-anonymous-page-views/">Caching Anonymous Page Views at the Edge with Cloudflare</a></p></li></ul>
    <div>
      <h4>Rate Limiting</h4>
      <a href="#rate-limiting">
        
      </a>
    </div>
    <p>Caching isn't enough; non-idempotent HTTP requests like POST, PUT and DELETE are not safe to cache - as such making these requests can bypass caching efforts used to prevent Application Layer DDoS attacks. Additionally, attackers can attempt to vary URLs to bypass advanced caching behaviour.</p><p>Software exists for web servers to be able to perform rate limiting before anything hits dynamic logic; examples of such tools include <a href="https://www.fail2ban.org/wiki/index.php/Main_Page">Fail2Ban</a> and <a href="https://httpd.apache.org/docs/trunk/mod/mod_ratelimit.html">Apache mod_ratelimit</a>.</p><p>If you do rate limiting on your server itself, be sure to configure your edge network to cache the rate limit block pages, such that attackers cannot perform Application Layer attackers once blocked. This can be done by caching responses with a 429 status code against a Custom Cache Key based on the IP Address.</p><p>Services like Cloudflare offer Rate Limiting at their edge network; for example <a href="https://www.cloudflare.com/rate-limiting/">Cloudflare Rate Limiting</a>.</p>
    <div>
      <h3>Conclusion</h3>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>As the capacity of networks like Cloudflare continue to grow, attackers move from attempting DDoS attacks at the network layer to performing DDoS attacks targeted at applications themselves.</p><p>For applications to be resilient to DDoS attacks, it is no longer enough to use a large network. A large network must be complemented with tooling that is able to filter malicious Application Layer attack traffic, even when attackers are able to make such attacks look near-legitimate.</p> ]]></content:encoded>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[Attacks]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Mitigation]]></category>
            <category><![CDATA[DNS]]></category>
            <guid isPermaLink="false">4o1cEOKQJRXCb4DQJy1fWR</guid>
            <dc:creator>Junade Ali</dc:creator>
        </item>
        <item>
            <title><![CDATA[No Scrubs: The Architecture That Made Unmetered Mitigation Possible]]></title>
            <link>https://blog.cloudflare.com/no-scrubs-architecture-unmetered-mitigation/</link>
            <pubDate>Mon, 25 Sep 2017 13:00:33 GMT</pubDate>
            <description><![CDATA[ When building a DDoS mitigation service it’s incredibly tempting to think that the solution is scrubbing centers or scrubbing servers. I, too, thought that was a good idea in the beginning,  ]]></description>
            <content:encoded><![CDATA[ <p>When building a DDoS mitigation service it’s incredibly tempting to think that the solution is scrubbing centers or scrubbing servers. I, too, thought that was a good idea in the beginning, but experience has shown that there are serious pitfalls to this approach.</p><p>A scrubbing server is a dedicated machine that receives all network traffic destined for an IP address and attempts to filter good traffic from bad. Ideally, the scrubbing server will only forward non-DDoS packets to the Internet application being attacked. A scrubbing center is a dedicated location filled with scrubbing servers.</p>
    <div>
      <h3>Three Problems With Scrubbers</h3>
      <a href="#three-problems-with-scrubbers">
        
      </a>
    </div>
    <p>The three most pressing problems with scrubbing are: <i>bandwidth</i>, <i>cost</i>, <i>knowledge</i>.</p><p>The <i>bandwidth</i> problem is easy to see. As DDoS attacks have scaled to &gt;1Tbps having that much network capacity available is problematic. Provisioning and maintaining multiple-Tbps of bandwidth for DDoS mitigation is expensive and complicated. And it needs to be located in the right place on the Internet to receive and absorb an attack. If it’s not then attack traffic will need to be received at one location, scrubbed, and then clean traffic forwarded to the real server: that can introduce enormous delays with a limited number of locations.</p><p>Imagine for a moment you’ve built a small number of scrubbing centers, and each center is connected to the Internet with many Gbps of connectivity. When a DDoS attack occurs that center needs to be able to handle potentially 100s of Gbps of attack traffic at line rate. That means exotic network and server hardware. Everything from the line cards in routers, to the network adapter cards in the servers, to the servers themselves is going to be very expensive.</p><p>This (and bandwidth above) is one of the reasons DDoS mitigation has traditionally <i>cost</i> so much and been billed by attack size.</p><p>The final problem, <i>knowledge</i>, is the most easily overlooked. When you set out to build a scrubbing server you are building something that has to separate good packets from bad.</p><p>At first this seems easy (let’s filter out all TCP ACK packets for non-established connections, for example), and low level engineers are easy to excite about writing high-performance code to do that. But attackers are not stupid and they’ll throw legitimate looking traffic at a scrubbing server and it gets harder and harder to distinguish good from bad.</p><p>At that point, scrubbing engineers need to become protocol experts at all levels of the stack. That means you have to build a competency in all levels of TCP/IP, DNS, HTTP, TLS, etc. And that’s hard.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/64KLtaUHX9bNnt7gPObyUW/f8f58affdea28aa1b5e07ca8c8d31a16/4663602907_c7312af549_b.jpg" />
            
            </figure><p><a href="https://creativecommons.org/licenses/by-sa/2.0/">CC BY-SA 2.0</a> <a href="https://www.flickr.com/photos/lisibo/4663602907/in/photolist-dreKAq-877bEF-dC6SaH-87fRQa-87aUnd-878ubK-87j6bE-87fP1t-879ZGj-8777Wc-87aMK5-87iYJq-87bY93-878n2g-879Wmw-87a431-87b9sQ-87bUBy-87bupQ-87bmFE-87fxZ6-87bqyW-87aQm7-87bBCC-878E3P-87833H-87azmU-877ttK-877xNk-878QqX-87bJC5-87b69u-XUmTo9-2Zsyny-87b2YC-877L8r-ebekdb-8w1o6r-8w1nPr-8w1n7B-8w1hKt-8w4jCE-8w1jcZ-8w4qUS-8w1jJz-ovHP2-3feGiy-Wj2Q48-8w4nZQ-8w4m9s/">image</a> by <a href="https://www.flickr.com/photos/lisibo/">Lisa Stevens</a></p><p>The bottom line is scrubbing centers and exotic hardware are great marketing. But, like citadels of medieval times, they are monumentally expensive and outdated, overwhelmed by better weapons and warfighting techniques.</p><p>And many DDoS mitigation services that use scrubbing centers operate in an offline mode. They are only enabled when a DDoS occurs. This typically means that an Internet application will succumb to the DDoS attack before its traffic is diverted to the scrubbing center.</p><p>Just imagine citizens fleeing to hide behind the walls of the citadel under fire from an approaching army.</p>
    <div>
      <h3>Better, Cheaper, Smarter</h3>
      <a href="#better-cheaper-smarter">
        
      </a>
    </div>
    <p>There’s a subtler point about not having dedicated scrubbers: it forces us to build better software. If a scrubbing server becomes overwhelmed or fails then only the customer being scrubbed is affected, but when the mitigation happens on the very servers running the core service it has to work and be effective.</p><p>I spoke above about the ‘knowledge gap’ that comes about with dedicated DDoS scrubbing. The Cloudflare approach means that if bad traffic gets through, say a flood of bad DNS packets, then it reaches a service owned and operated by people who are experts in that domain. If a DNS flood gets through our DDoS protection it hits our custom DNS server, RRDNS, the engineers who work on it can bring their expertise to bear.</p><p>This makes an enormous difference because the result is either improved DDoS scrubbing or a change to the software (e.g. the DNS stack) that improves its performance under load. We’ve lived that story many, many times and the entire software stack has improved because of it.</p><p>The approach Cloudflare took to DDoS mitigation is rather simple: make every single server in Cloudflare participate in mitigation, load balance DDoS attacks across the data centers and servers within them and then apply smarts to the handling of packets. These are the same servers, processors and cores handling our entire service.</p><p>Eliminating scrubbing centers and hardware completely changes the cost of building a DDoS mitigation service.</p><p>We currently have around 15 Tbps of network capacity worldwide but this capacity doesn’t require exotic network hardware. We are able to use low cost or commodity networking equipment bound together using <a href="/the-internet-is-hostile-building-a-more-resilient-network/">network automation</a> to handle normal and DDoS traffic. Just as Google originally built its service by writing software that tied together commodity servers into a super (search) computer; our architecture binds commodity servers together into one giant network device.</p><p>By building the world’s <a href="https://www.peeringdb.com/net/4224">most peered network</a> we’ve built this capacity at reasonable cost and more importantly are able to handle attack traffic globally wherever it originates with low latency links. No scrubbing solution is able to say the same.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3nhLKpOgJaQcxsCM0kFcHM/286dc2bbea0e3649aa73a8ad766adb66/upload.png" />
            
            </figure><p>And because Cloudflare manages DNS for our customers and uses an Anycasted network attack traffic originating from botnets is automatically distributed across our global network. Each data center deals with a portion of DDoS traffic.</p><p>Within each data center DDoS traffic is load balanced across multiple servers running our service. Each server handles a portion of the DDoS traffic. This spreading of DDoS traffic means that a single DDoS attack will be handled by a large number of individual servers across the world.</p><p>And as Cloudflare grows our DDoS mitigation capacity grows automatically, and because our DDoS mitigation is built into our stack it is always on. We mitigate a new DDoS attack every three minutes with no downtime for Internet applications and have no need to ‘switch over’ to a scrubbing center.</p>
    <div>
      <h3>Inside a Server</h3>
      <a href="#inside-a-server">
        
      </a>
    </div>
    <p>Once all this global and local load balancing has occurred packets do finally hit a network adapter card in a server. It’s here that Cloudflare’s custom DDoS mitigation stack comes into play.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5SsunNzjBUScjiFqvOYqsC/a2f6047bc836d427abf6659b86826c3c/image2-1.png" />
            
            </figure><p>Over the years we’ve learned how to automatically detect and mitigate anything the Internet can throw at us. For most of the attacks, we rely on dynamically managing iptables: the standard Linux firewall. We’ve spoken about <a href="https://speakerdeck.com/majek04/lessons-from-defending-the-indefensible">the most effective techniques</a> in past. iptables has a number of very powerful features which we select depending on specific attack vector. From our experience xt_bpf, ipset, hashlimits and connlimits are the most useful iptables modules.</p><p>For very large attacks the Linux Kernel is not fast enough though. To relieve the kernel from processing excessive number of packets, we experimented with various <a href="/kernel-bypass/">kernel bypass</a> techniques. We’ve settled on a partial kernel bypass interface - Solarflare specific EFVI.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/f4YDKLOLmRBcMaV2CojRO/a38c51911bfaf0cd830d7d0fe284ed26/image1.png" />
            
            </figure><p>With EFVI we can offload the processing of our firewall rules to a user space program, and we can easily process millions of packets per second on each server, while keeping the CPU usage low. This allows us to withstand the largest attacks, without affecting our multi-tenant service.</p>
    <div>
      <h3>Open Source</h3>
      <a href="#open-source">
        
      </a>
    </div>
    <p>Cloudflare’s vision is to help to build a better Internet. Fixing DDoS is a part of it. We’ve been relentlessly documenting the most important and dangerous attacks we’ve encountered, fighting botnets and open sourcing critical pieces of our DDoS infrastructure.</p><p>We’ve open sourced various tools, from the very low level projects like our <a href="/introducing-the-bpf-tools/">BPF Tools</a>, that we use to fight <a href="/introducing-the-p0f-bpf-compiler/">DNS and SYN</a> floods, to contributing to <a href="https://github.com/openresty/">OpenResty</a> a performant application framework on top of NGINX, which is great for building L7 defenses.</p>
    <div>
      <h3>Further Reading</h3>
      <a href="#further-reading">
        
      </a>
    </div>
    <p>Cloudflare has written a great deal about DDoS mitigation in the past. Some example, blog posts: <a href="/how-cloudflares-architecture-allows-us-to-scale-to-stop-the-largest-attacks/">How Cloudflare's Architecture Allows Us to Scale to Stop the Largest Attacks</a>, <a href="/reflections-on-reflections/">Reflections on reflection (attacks)</a>, <a href="/the-daily-ddos-ten-days-of-massive-attacks/">The Daily DDoS: Ten Days of Massive Attacks</a>, and <a href="/the-internet-is-hostile-building-a-more-resilient-network/">The Internet is Hostile: Building a More Resilient Network</a>.</p><p>And if you want to go deeper, my colleague Marek Majkowski <a href="/meet-gatebot-a-bot-that-allows-us-to-sleep/">dives deeper into</a> the code we use for DDoS mitigation.</p>
    <div>
      <h3>Conclusion</h3>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>Cloudflare’s DDoS mitigation architecture and custom software makes Unmetered Mitigation possible. With it we can withstand the largest DDoS attacks and as our network grows our DDoS mitigation capability grows with it.</p> ]]></content:encoded>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[Mitigation]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Attacks]]></category>
            <guid isPermaLink="false">3UnqF7Cu9bfhDiSUUu2G5r</guid>
            <dc:creator>John Graham-Cumming</dc:creator>
        </item>
        <item>
            <title><![CDATA[Meet Gatebot - a bot that allows us to sleep]]></title>
            <link>https://blog.cloudflare.com/meet-gatebot-a-bot-that-allows-us-to-sleep/</link>
            <pubDate>Mon, 25 Sep 2017 13:00:30 GMT</pubDate>
            <description><![CDATA[ In the past, we’ve spoken about how Cloudflare is architected to sustain the largest DDoS attacks. During traffic surges we spread the traffic across a very large number of edge servers.  ]]></description>
            <content:encoded><![CDATA[ <p>In the past, we’ve spoken about how <a href="/how-cloudflares-architecture-allows-us-to-scale-to-stop-the-largest-attacks/">Cloudflare is architected to sustain the largest DDoS attacks</a>. During traffic surges we spread the traffic across a very large number of edge servers. This architecture allows us to avoid having a single choke point because the traffic gets distributed externally across multiple datacenters and internally across multiple servers. We do that by employing Anycast and ECMP.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/52cb3pH0qnYZoAFRwCILPu/c195045b834c4be87d338318def95590/gatebot-3.jpg" />
            
            </figure><p>We don't use separate scrubbing boxes or specialized hardware - every one of our edge servers can perform advanced traffic filtering if the need arises. This allows us to scale up our DDoS capacity as we grow. Each of the new servers we add to our datacenters increases our maximum theoretical DDoS “scrubbing” power. It also scales down nicely - in smaller datacenters we don't have to overinvest in expensive dedicated hardware.</p><p>During normal operations our attitude to attacks is rather pragmatic. Since the inbound traffic is distributed across hundreds of servers we can survive periodic spikes and small attacks without doing anything. Vanilla Linux is remarkably resilient against unexpected network events. This is especially true since kernel 4.4 when <a href="https://lwn.net/Articles/659199/">the performance of SYN cookies was greatly improved</a>.</p><p>But at some point, malicious traffic volume can become so large that we must take the load off the networking stack. We have to minimize the amount of CPU spent on dealing with attack packets. Cloudflare operates a multi-tenant service and we must always have enough processing power to serve valid traffic. We can't afford to starve our HTTP proxy (nginx) or custom DNS server (named RRDNS, written in Go) of CPU. When the attack size crosses a predefined threshold (which varies greatly depending on specific attack type), we must intervene.</p>
    <div>
      <h3>Mitigations</h3>
      <a href="#mitigations">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3WIjf8ScvlEu9XR2E1qS4a/2da9a0fa8e7dafceb1c70938adf33bb2/Screen-Shot-2017-09-25-at-11.21.11-AM.png" />
            
            </figure><p>During large attacks we deploy mitigations to reduce the CPU consumed by malicious traffic. We have multiple layers of defense, each tuned to specific attack vector.</p><p>First, there is “scattering”. Since we control DNS resolution we are able to move the domains we serve between IP addresses (we call this "scattering"). This is an effective technique as long as the attacks don’t follow the updated DNS resolutions. This often happens for L3 attacks where the attacker has hardcoded the IP address of the target.</p><p>Next, there is a wide range of mitigation techniques that leverage iptables, the firewall built in to the Linux kernel. But we don't use it like a conventional firewall, with a static set of rules. We continuously add, tweak and remove rules, based on specific attack characteristics. Over the years we have mastered the most effective iptables extensions:</p><ul><li><p>xt_bpf</p></li><li><p>ipsets</p></li><li><p>hashlimits</p></li><li><p>connlimit</p></li></ul><p>To make the most of iptables, we built a system to manage the iptables configuration across our entire fleet, allowing us to rapidly deploy rules everywhere. This fits our architecture nicely: due to Anycast, an attack against a single IP will be delivered to multiple locations. Running iptables rules for that IP on all servers makes sense.</p><p>Using stock iptables gives us plenty of confidence. When possible we prefer to use off-the-shelf tools to deal with attacks.</p><p>Sometimes though, even this is not sufficient. Iptables is fast in the general case, but has its limits. During very large attacks, exceeding 1M packets per second per server, we shift the attack traffic from kernel iptables to a kernel bypass user space program (which we call floodgate). We use a <a href="/kernel-bypass/">partial kernel bypass</a> solution using Solarflare EF_VI interface. With this on each server we can process more than 5M attack packets per second while consuming only a single CPU core. With floodgate we have comfortable amount of CPU left for our applications, even during the largest network events.</p><p>Finally, there are a number of tweaks we can make on at the HTTP layer. For specific attacks we disable HTTP Keep-Alives forcing attackers to re-establish TCP sessions for each request. This sacrifices a bit of performance for valid traffic as well, but is a surprisingly powerful tool throttling many attacks. For other attack patterns we turn the “I’m under attack” mode on, forcing the attack to hit our JavaScript challenge page.</p>
    <div>
      <h3>Manual attack handling</h3>
      <a href="#manual-attack-handling">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5JkPhtefpkMJhVoCdOB6Qu/a6b2729f532ee2aea60e8b4c302b9d15/cloudflare_outage.png.scaled500.png" />
            
            </figure><p>Early on these mitigations were applied manually by our tireless System Reliability Engineers (SRE's). Unfortunately, it turns out that humans under stress... well, make mistakes. We learned it the hard way - one of the most famous incidents happened in <a href="/todays-outage-post-mortem-82515/">March 2013 when a simple typo</a> brought our whole network down.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3OtDHft56O7PHCALmswJoY/12605f0ab862bc99291977400fd47a1e/Screen-Shot-2017-09-25-at-11.19.11-AM.png" />
            
            </figure><p>Humans are also not great at applying precise rules. As our systems grew and mitigations became more complex, having many specific toggles, our SREs got overwhelmed by the details. It was challenging to present all the specific information about the attack to the operator. We often applied overly-broad mitigations, which were unnecessarily affecting legitimate traffic. All that changed with the introduction of Gatebot.</p>
    <div>
      <h3>Meet Gatebot</h3>
      <a href="#meet-gatebot">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7nri9vLoPsgtq19T6dHxf/0f9bf6c60d0bbe08b573491353a1211b/Screen-Shot-2017-09-25-at-11.19.31-AM.png" />
            
            </figure><p>To aid our SREs we developed a fully automatic mitigation system. We call it Gatebot<a href="#fn1">[1]</a>.</p><p>The main goal of Gatebot was to automate as much of the mitigation workflow as possible. That means: to observe the network and note the anomalies, understand the targets of attacks and their metadata (such as the type of customer involved), and perform appropriate mitigation action.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/53NqKtsdEEISOc07d2wRZL/4a90e0727ee78f72c9cbc4bb535520da/Screen-Shot-2017-09-25-at-11.20.22-AM.png" />
            
            </figure><p>Nowadays we have multiple Gatebot instances - we call them “mitigation pipelines”. Each pipeline has three parts:</p><ol><li><p>“attack detection” or “signal” - A dedicated system detects anomalies in network traffic. This is usually done by sampling a small fraction of the network packets hitting our network, and analyzing them using streaming algorithms. With this we have a real-time view of the current status of the network. This part of the stack is written in Golang, and even though it only examines the sampled packets, it's pretty CPU intensive. It might comfort you to know that at this very moment two big Xeon servers burn all of their combined 48 Skylake CPU cores toiling away counting packets and performing sophisticated analytics looking for attacks.</p></li><li><p>“reactive automation” or “business logic”. For each anomaly (attack) we see who the target is, can we mitigate it, and with what parameters. Depending on the specific pipeline, the business logic may be anything from a trivial procedure to a multi-step process requiring a number of database lookups and potentially confirmation from a human operator. This code is not performance critical and is written in Python. To make it more accessible and readable by others in company, we developed a simple functional, reactive programming engine. It helps us to keep the code clean and understandable, even as we add more steps, more pipelines and more complex logic. To give you a flavor of the complexity: imagine how the system should behave if a customer upgraded a plan during an attack.</p></li><li><p>“mitigation”. The previous step feeds specific mitigation instructions into the centralized mitigation management systems. The mitigations are deployed across the world to our servers, applications, customer settings and, in some cases, to the network hardware.</p></li></ol>
    <div>
      <h3>Sleeping at night</h3>
      <a href="#sleeping-at-night">
        
      </a>
    </div>
    <p>Gatebot operates constantly, without breaks for lunch. For the iptables mitigations pipelines alone, Gatebot got engaged between 30 and 1500 times a day. Here is a chart of mitigations per day over last 6 months:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2Gb8UBopCCuM9zdrBlVeic/39a7f9453e82e719272573c35143f316/attacks-iptables.png" />
            
            </figure><p>Gatebot is much faster and much more precise than even our most experienced SREs. Without Gatebot we wouldn’t be able to operate our service with the appropriate level of confidence. Furthermore, Gatebot has proved to be remarkably adaptable - we started by automating handling of Layer 3 attacks, but soon we proved that the general model works well for automating other things. Today we have more than 10 separate Gatebot instances doing everything from mitigating Layer 7 attacks to informing our Customer Support team of misbehaving customer origin servers.</p><p>Since Gatebot’s inception we learned greatly from the "detection / logic / mitigation" workflow. We reused this model in our <a href="/the-internet-is-hostile-building-a-more-resilient-network/">Automatic Network System</a> which is used to relieve network congestion<a href="#fn2">[2]</a>.</p><p>Gatebot allows us to protect our users no matter of the plan. Whether you are a on a Free, Pro, Business or Enterprise plan, Gatebot is working for you. This is why we can afford to provide the same level of DDoS protection for all our customers<a href="#fn3">[3]</a>.</p><hr /><p><i>Dealing with attacks sound interesting? Join our </i><a href="https://boards.greenhouse.io/cloudflare/jobs/589572"><i>world famous DDoS team</i></a><i> in London, Austin, San Francisco and our elite office in Warsaw, Poland</i>.</p><hr /><hr /><ol><li><p>Fun fact: all our components in this area are called “gate-something”, like: gatekeeper, gatesetter, floodgate, gatewatcher, gateman... Who said that naming things must be hard? <a href="#fnref1">↩︎</a></p></li><li><p>Some of us have argued that this system should be called Netbot. <a href="#fnref2">↩︎</a></p></li><li><p>Note: there are caveats. Ask your Success Engineer for specifics! <a href="#fnref3">↩︎</a></p></li></ol> ]]></content:encoded>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[Mitigation]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Attacks]]></category>
            <category><![CDATA[Gatebot]]></category>
            <guid isPermaLink="false">2L5aTRvTEqUKS0ayXhKqk1</guid>
            <dc:creator>Marek Majkowski</dc:creator>
        </item>
        <item>
            <title><![CDATA[Unmetered Mitigation: DDoS Protection Without Limits]]></title>
            <link>https://blog.cloudflare.com/unmetered-mitigation/</link>
            <pubDate>Mon, 25 Sep 2017 13:00:00 GMT</pubDate>
            <description><![CDATA[ This is the week of Cloudflare's seventh birthday. It's become a tradition for us to announce a series of products each day of this week and bring major new benefits to our customers. We're beginning with one I'm especially proud of: Unmetered Mitigation. ]]></description>
            <content:encoded><![CDATA[ <p>This is the week of Cloudflare's seventh birthday. It's become a tradition for us to announce a series of products each day of this week and bring major new benefits to our customers. We're beginning with one I'm especially proud of: Unmetered Mitigation.</p><p><a href="https://creativecommons.org/licenses/by-sa/2.0/">CC BY-SA 2.0</a> <a href="https://www.flickr.com/photos/vassilisonline/7032242423/in/photolist-bHq5F2-5Roy7N-8gQU1J-YtEpRs-7NB6zG-99R9QE-99N15c-VEBnjA-VMEQcN-8TZg4D-e1D1kW-afUB7x-UHtShC-8fMgip-gPkeLb-dV9aSt-cNCSuf-ra86m1-UA6FDo-SUdJL9-ptQUV5-mbpzbr-jxx5ep-prY8mE-pu6z7u-gYfpGR-iEiaEg-99R8R7-pi7RgX-bhgKm8-Vfc7gy-5QKzfs-bznfB8-ViKyFe-bNknvp-8ML8Wg-UHtYif-3UZcW-VHokAH-omifo3-9GJBdU-bBFcLt-bK5vji-kLmBKL-9GFL3p-ahZRdG-8EVRmL-U57No2-gmSvHy-6jXLSo">image</a> by <a href="https://www.flickr.com/photos/vassilisonline/">Vassilis</a></p><p>Cloudflare runs one of the largest networks in the world. One of our key services is DDoS mitigation and we deflect a new DDoS attack aimed at our customers every three minutes. We do this with over 15 terabits per second of DDoS mitigation capacity. That's more than the publicly announced capacity of every other DDoS mitigation service we're aware of combined. And we're continuing to invest in our network to expand capacity at an accelerating rate.</p>
    <div>
      <h3>Surge Pricing</h3>
      <a href="#surge-pricing">
        
      </a>
    </div>
    <p>Virtually every Cloudflare competitor will send you a bigger bill if you are unlucky enough to get targeted by an attack. We've seen examples of small businesses that survive massive attacks to then be crippled by the bills other DDoS mitigation vendors sent them. From the beginning of Cloudflare's history, it never felt right that you should have to pay more if you came under an attack. That feels barely a step above extortion.</p><p>With today’s announcement we are eliminating this industry standard of ‘surge pricing’ for DDoS attacks. Why should customers pay more just to defend themselves? Charging more when the customer is experiencing a painful attack feels wrong; just as surge pricing when it rains hurts ride-sharing customers when they need a ride the most.</p>
    <div>
      <h3>End of the FINT</h3>
      <a href="#end-of-the-fint">
        
      </a>
    </div>
    <p>That said, from our early days, we would sometimes fail customers off our network if the size of an attack they received got large enough that it affected other customers. Internally, we referred to this as FINTing (for Fail INTernal) a customer.</p><p>The standards for when a customer would get FINTed were situation dependent. We had rough thresholds depending on what plan they were on, but the general rule was to keep a customer online unless the size of the attack impacted other customers. For customers on higher tiered plans, when our automated systems didn't handle the attacks themselves, our technical operations team could take manual steps to protect them.</p><p>Every morning I receive a list of all the customers that were FINTed the day before. Over the last four years the number of FINTs has dwindled. The reality is that our network today is at such a scale that we are able to mitigate even the largest DDoS attacks without it impacting other customers. This is almost always handled automatically. And, when manual intervention is required, our techops team has gotten skilled enough that it isn't overly taxing.</p>
    <div>
      <h3>Aligning With Our Customers</h3>
      <a href="#aligning-with-our-customers">
        
      </a>
    </div>
    <p>So today, on the first day of our Birthday Week celebration, we make it official for all our customers: Cloudflare will no longer terminate customers, regardless of the size of the DDoS attacks they receive, regardless of the plan level they use. And, unlike the prevailing practice in the industry, we will never jack up your bill after the attack. Doing so, frankly, is perverse.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4Qe9itqCX89UqGqEqzfNzD/138593b36c9b254e3db69a81c6072abe/4311678389_7a87aeda67_b.jpg" />
            
            </figure><p><a href="https://creativecommons.org/licenses/by-sa/2.0/">CC BY-SA 2.0</a> <a href="https://www.flickr.com/photos/archer10/4311678389/in/photolist-7z1tLz-eSLMSn-8kgnpT-at38nu-6VmkAV-pXaz61-9tjcuc-Trgo6P-FpEuw-9tnafL-2tXqpW-6oPBGD-8kghmX-npVc6v-654RiP-69Xju6-5dRXNV-nXLz7d-dLWdW4-8hEkgQ-74JnHN-8hEjXY-dG3Cea-at3aEC-dCqKwX-aoVKwE-bAPYDW-9idzwb-8kjBY3-8hEmnu-eYrFnB-8Z2A4R-iRan2N-9Yv2By-dcwh3y-8hB6rv-5Jpwgw-5Jkfoz-5JpvG7-bpiiMU-Rw7uFC-HAv4Mj-qLAGJJ-g8uixx-8hEk8f-9jdRyp-8hB6a8-6SUq6q-4AjJ48-8BxTBD">image</a> by <a href="https://www.flickr.com/photos/archer10/">Dennis Jarvis</a></p><p>We call this Unmetered Mitigation. It stems from a basic idea: you shouldn't have to pay more to be protected from bullies who try and silence you online. Regardless of what Cloudflare plan you use — Free, Pro, Business, or Enterprise — we will never tell you to go away or that you need to pay us more because of the size of an attack.</p><p>Cloudflare's higher tier plans will continue to offer more sophisticated reporting, tools, and customer support to better tune our protections against whatever <a href="https://www.cloudflare.com/products/zero-trust/threat-defense/">threats</a> you face online. But volumetric DDoS mitigation is now officially unlimited and unmetered.</p>
    <div>
      <h3>Setting the New Standard</h3>
      <a href="#setting-the-new-standard">
        
      </a>
    </div>
    <p>Back in 2014, during Cloudflare's birthday week, we announced that we were making encryption free for all our customers. We did it because it was the right thing to do and we'd finally developed the technical systems we needed to do it at scale. At the time, people said we were crazy. I'm proud of the fact that, three years later, the rest of the industry has followed our lead and encryption by default has become the standard.</p><p>I'm hopeful the same will happen with DDoS mitigation. If the rest of the industry moves away from the practice of surge pricing and builds DDoS mitigation in by default then it would largely end DDoS attacks for good. We took a step down that path today and hope, like with encryption, the rest of the industry will follow.</p><p>Want to know more? Read <a href="/no-scrubs-architecture-unmetered-mitigation">No Scrubs: The Architecture That Made Unmetered Mitigation Possible</a> and <a href="/meet-gatebot-a-bot-that-allows-us-to-sleep">Meet Gatebot - a bot that allows us to sleep</a>.</p> ]]></content:encoded>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Birthday Week]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Attacks]]></category>
            <category><![CDATA[Mitigation]]></category>
            <guid isPermaLink="false">5Xi7Et2DpSpRbSaVzGkbmK</guid>
            <dc:creator>Matthew Prince</dc:creator>
        </item>
        <item>
            <title><![CDATA[Cloudflare Rate Limiting - Insight, Control, and Mitigation against Layer 7 DDoS Attacks]]></title>
            <link>https://blog.cloudflare.com/rate-limiting/</link>
            <pubDate>Thu, 13 Apr 2017 20:34:00 GMT</pubDate>
            <description><![CDATA[ Today, Cloudflare is extending its Rate Limiting service by allowing any of our customers to sign up. Our Enterprise customers have enjoyed the benefits of Cloudflare’s Rate Limiting offering for the past several months.  ]]></description>
            <content:encoded><![CDATA[ <p>Today, Cloudflare is extending its <a href="https://www.cloudflare.com/rate-limiting/">Rate Limiting</a> service by allowing any of our customers to sign up. Our Enterprise customers have enjoyed the benefits of Cloudflare’s Rate Limiting offering for the past several months. As part of our mission to build a better internet, we believe that everyone should have the ability to sign up for the service to protect their websites and <a href="https://www.cloudflare.com/learning/security/api/what-is-an-api/">APIs</a>.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/XnkcibsLLCJF6rpgPqksN/083f6b9aca04a2f0129a862cea15c263/benjamin-child-16017.jpg" />
            
            </figure><p><a href="https://creativecommons.org/licenses/by/2.0/">CC-BY 2.0</a> <a href="https://unsplash.com/photos/IGqMKnl6LNE">image</a> by <a href="https://unsplash.com/@bchild311">Benjamin Child</a></p><p>Rate Limiting is one more feature in our arsenal of tools that help to protect our customers against denial-of-service attacks, brute-force password attempts, and other types of abusive behavior targeting the application layer. Application layer attacks are usually a barrage of HTTP/S requests which may look like they originate from real users, but are typically generated by machines (or bots). As a result, application layer attacks are often harder to detect and can more easily bring down a site, application, or API. Rate Limiting complements our existing DDoS protection services by providing control and insight into Layer 7 DDoS attacks.</p><p>Rate Limiting is now available to all customers across <a href="https://www.cloudflare.com/plans/">all plans</a> as an optional paid feature. The first 10,000 qualifying requests are free, which allows customers to start using the feature without any cost .</p>
    <div>
      <h4>Real world examples of how Rate Limiting helped Cloudflare customers</h4>
      <a href="#real-world-examples-of-how-rate-limiting-helped-cloudflare-customers">
        
      </a>
    </div>
    <p>Over the past few months, Cloudflare customers ranging from <a href="https://www.cloudflare.com/ecommerce/">e-commerce companies</a> to high-profile, ad-driven platforms have been using this service to mitigate malicious attacks. It made a big difference to their business: they’ve stopped revenue loss, reduced infrastructure costs, and protected valuable information, such as intellectual property and/or customer data.</p><p>Several common themes have emerged for customers who have been successfully using Rate Limiting during the past couple months. The following are examples of some of the issues those customers have faced and how Rate Limiting addressed them.</p>
    <div>
      <h4>High-volume attacks designed to bring down e-commerce sites</h4>
      <a href="#high-volume-attacks-designed-to-bring-down-e-commerce-sites">
        
      </a>
    </div>
    <p>Buycraft, <a href="https://www.cloudflare.com/case-studies/buycraft/">a Minecraft e-commerce platform</a>, was subjected to denial-of-service attacks which could have brought down the e-commerce stores of its 500,000+ customers. Rate Limiting addresses this common attack type by blocking offending IP addresses at its network edge, so the malicious traffic doesn’t reach the origin servers and impact customers.</p>
    <div>
      <h4>Attacks against API endpoints</h4>
      <a href="#attacks-against-api-endpoints">
        
      </a>
    </div>
    <p>Haveibeenpwned.com <a href="https://www.cloudflare.com/case-studies/troy-hunt/">provides an API</a> that surfaces accounts that have been hacked to help potential victims identify whether their credentials have been compromised. Troy Hunt (the service’s creator), decided to use Cloudflare’s Rate Limiting to protect his API from malicious traffic, leading to <a href="https://www.cloudflare.com/solutions/ecommerce/optimization/">improved performance</a> and reduced infrastructure costs.</p>
    <div>
      <h4>Brute-force login attacks</h4>
      <a href="#brute-force-login-attacks">
        
      </a>
    </div>
    <p>After IT consulting firm 2600 Solutions, which manages Wordpress sites for clients, was brute-forced over 200 times in a month, owner Jeff Williams decided to use Cloudflare Rate Limiting. By blocking excessive failed login attempts, they were able to not only protect their clients’ sites from being compromised, they also ensured legitimate users were not impacted by slower application performance.</p>
    <div>
      <h4>Bots scraping the site for content</h4>
      <a href="#bots-scraping-the-site-for-content">
        
      </a>
    </div>
    <p>Another Cloudflare customer saw valuable content being scraped from their site by competitors using bots. Competitors then used this scraped content to boost their own search engine ranking at the expense of the targeted site. Our customer lost tens of thousands of dollars before using Cloudflare’s Rate Limiting to <a href="https://www.cloudflare.com/learning/ai/how-to-prevent-web-scraping/">prevent the bots from scraping content</a>.</p>
    <div>
      <h4>How do I get started with Rate Limiting?</h4>
      <a href="#how-do-i-get-started-with-rate-limiting">
        
      </a>
    </div>
    <p>Anyone can start utilizing the benefits of Cloudflare’s Rate Limiting. With the Cloudflare Dashboard, go to the Firewall tab, and within the Rate Limiting card, click on “Enable Rate Limiting.”</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6QLMfHzGF2J1ju4dLTmP6C/a6826edb544bb1a66b6a0cde5bd0a2d4/Screen-Shot-2017-04-13-at-1.17.47-PM.png" />
            
            </figure><p>Even though you will be prompted to enter a payment method to start using the service, you will not be charged for <a href="https://support.cloudflare.com/hc/en-us/articles/115000272247-Billing-for-Cloudflare-Rate-Limiting">the first 10,000 qualifying requests</a>. Once done, <a href="https://support.cloudflare.com/hc/en-us/articles/115001635128-Configuring-Rate-Limiting-from-UI">you’ll be able to create rules</a>.</p><p>If you are on an Enterprise plan, contact your Cloudflare Customer Success Manager to enable Rate Limiting.</p>
    <div>
      <h4>Tighter control over the type of traffic to rate limit</h4>
      <a href="#tighter-control-over-the-type-of-traffic-to-rate-limit">
        
      </a>
    </div>
    <p>As customers begin to understand attack patterns and their own application’s potential vulnerabilities, they can tighten criteria. All customers can create path-specific rules, using wildcards (for example: <a href="http://www.example.com/login/\">www.example.com/login/\</a>* or <a href="http://www.example.com/\*/checkout.php">www.example.com/\*/checkout.php</a>). Customers on a Business or higher plan can specify to rate limit only certain HTTP request methods.</p>
    <div>
      <h4>Simulate traffic to tune your rules</h4>
      <a href="#simulate-traffic-to-tune-your-rules">
        
      </a>
    </div>
    <p>Customers on the Pro and higher plans will be able to ‘simulate’ rules. A rule in simulate mode will not actually block malicious traffic, but will allow you to understand what traffic will be blocked if you were to setup a ‘live’ rule. All Customers will have analytics (coming soon) to let them gain insights into the traffic patterns to their site, and the efficacy of their rules.</p>
    <div>
      <h4>Next Steps</h4>
      <a href="#next-steps">
        
      </a>
    </div>
    <ul><li><p>If you haven’t enabled Rate Limiting yet, go to the <a href="https://www.cloudflare.com/a/firewall/">Firewall App</a> and enable Rate Limiting</p></li><li><p><a href="https://support.cloudflare.com/hc/en-us/articles/115001635128-Configuring-Rate-Limiting-from-UI">Create your first rule</a></p></li><li><p>For more information, including a demo of Rate Limiting in action, visit <a href="http://www.cloudflare.com/rate-limiting/">www.cloudflare.com/rate-limiting/</a>.</p></li></ul><p></p> ]]></content:encoded>
            <category><![CDATA[Rate Limiting]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Product News]]></category>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[Attacks]]></category>
            <category><![CDATA[Mitigation]]></category>
            <category><![CDATA[Security]]></category>
            <guid isPermaLink="false">3jJX6AKmlz33ZH8YTcMMpa</guid>
            <dc:creator>Timothy Fong</dc:creator>
        </item>
        <item>
            <title><![CDATA[The Daily DDoS: Ten Days of Massive Attacks]]></title>
            <link>https://blog.cloudflare.com/the-daily-ddos-ten-days-of-massive-attacks/</link>
            <pubDate>Fri, 02 Dec 2016 13:21:26 GMT</pubDate>
            <description><![CDATA[ In March 2015, we wrote about a Winter of Whopping Weekend DDoS Attacks where we were seeing 400Gbps attacks. We speculated that attackers were busy with something else during the week. ]]></description>
            <content:encoded><![CDATA[ <p>Back in March my colleague Marek wrote about a <a href="/a-winter-of-400gbps-weekend-ddos-attacks/">Winter of Whopping Weekend DDoS Attacks</a> where we were seeing 400Gbps attacks occurring mostly at the weekends. We speculated that attackers were busy with something else during the week.</p><p>This winter we've seen a new pattern, and attackers aren't taking the week off, but they do seem to be working regular hours.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2YzJzuM2mlCQrDQ4UGWgHQ/f2926d403c2f9514734633f664dae2b4/13368326574_3766538c08_z.jpg" />
            
            </figure><p><a href="https://creativecommons.org/licenses/by/2.0/">CC BY 2.0</a> <a href="https://www.flickr.com/photos/librariesrock/13368326574/in/photolist-mnjaKd-hsNTp3-fwRfUh-i1ANfq-GkBCdy-io1qcq-q41yUK-EGwk5F-qfEmaz-775pC8-7G9WfY-5Xp3jH-7MAEcz-hTZPp7-CxnxVN-E1hrZV-6XFPw7-pHQoCm-6ZTcR2-91i9fM-8SQY6m-dDEg1a-q2RLX4-7EH6CN-apKYx5-FdkLrN-7puG8m-9EW9k4-dHeD4u-EHJoRC-dr2FhP-rvRMra-rpH2dX-dDr3e1-6D8zy4-5Ytm2k-qpjQ1M-4gw1Er-nNFeMh-aXdZgg-aMNdAg-9gxZe7-tA6Xf-kdRa6N-7yK95i-8Quqz-DY8m5W-bRioGt-qNFviU-uEzyE2">image</a> by <a href="https://www.flickr.com/photos/librariesrock/">Carol VanHook</a></p><p>On November 23, the day before US Thanksgiving, our systems detected and mitigated an attack that peaked at 172Mpps and 400Gbps. The attack started at 1830 UTC and lasted non-stop for almost exactly 8.5 hours stopping at 0300 UTC. It felt as if an attacker 'worked' a day and then went home.</p>
            <figure>
            <a href="http://staging.blog.mrk.cfdata.org/content/images/2016/12/Screen-Shot-2016-12-02-at-11.00.41.png">
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3A8FzbN8hX6Qogv7KpVFeT/c63ab79f724e12acc13f67f0feaf6d8b/Screen-Shot-2016-12-02-at-11.00.41.png" />
            </a>
            </figure><p>The very next day the same thing happened again (although the attack started 30 minutes earlier at 1800 UTC).</p>
            <figure>
            <a href="http://staging.blog.mrk.cfdata.org/content/images/2016/12/Screen-Shot-2016-12-02-at-11.03.16.png">
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/UN2ck87CU98i7RiTPQemC/6777edecad24dde445b9ae5d1613e43f/Screen-Shot-2016-12-02-at-11.03.16.png" />
            </a>
            </figure><p>On the third day the attacker started promptly at 1800 UTC but went home a little early at around 0130 UTC. But they managed to peak the attack over 200Mpps and 480Gbps.</p>
            <figure>
            <a href="http://staging.blog.mrk.cfdata.org/content/images/2016/12/Screen-Shot-2016-12-02-at-11.04.21.png">
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/62tWVtXGbAYxfzBhWERpAf/f23a5f447c6e75a5f57a2306c2f1aba9/Screen-Shot-2016-12-02-at-11.04.21.png" />
            </a>
            </figure><p>And the attacker just kept this up day after day. Right through <a href="https://blog.cloudflare.com/the-truth-about-black-friday-and-cyber-monday/">Thanksgiving, Black Friday, Cyber Monday</a> and into this week. Night after night attacks were peaking at 400Gbps and hitting 320Gbps for hours on end.</p><p>This chart shows the packet rate in Mpps.</p>
            <figure>
            <a href="http://staging.blog.mrk.cfdata.org/content/images/2016/12/Screen-Shot-2016-12-02-at-10.53.27-1.png">
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/TjvOKcMZd6BGnqjD7yadL/411f519fbbb05708cfe4e90a1258542d/Screen-Shot-2016-12-02-at-10.53.27-1.png" />
            </a>
            </figure><p>This chart shows the attack bandwidth in <i>gigabytes</i> per second (multiply by 8 to get Gbps).</p>
            <figure>
            <a href="http://staging.blog.mrk.cfdata.org/content/images/2016/12/Screen-Shot-2016-12-02-at-13.51.10-1.png">
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2oLL7gsr5eXDP11OqpP6lg/c40e36f2dbbe5b57c222b0b7f9ea01e0/Screen-Shot-2016-12-02-at-13.51.10-1.png" />
            </a>
            </figure><p>This Tuesday things got interesting. The attacker stopped taking the night off and moved onto working 24 hours a day.</p><p>Another curiosity with these attacks is that they are <i>not</i> coming from the much talked about Mirai botnet. They are using different attack software and are sending very large L3/L4 floods aimed at the TCP protocol. The attacks are also highly concentrated in a small number of locations mostly on the US west coast.</p><p>Throughout we've mitigated the attack without impact on customers.</p><p>As we've written before, <a href="/how-cloudflares-architecture-allows-us-to-scale-to-stop-the-largest-attacks/">we architected</a> Cloudflare to handle massive attacks automatically. If you are interested in working on systems like this, we're <a href="https://www.cloudflare.com/join-our-team/">hiring</a>.</p> ]]></content:encoded>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[Attacks]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Mitigation]]></category>
            <guid isPermaLink="false">DaVQmdoqfBcu7YFyCjBYQ</guid>
            <dc:creator>John Graham-Cumming</dc:creator>
        </item>
        <item>
            <title><![CDATA[The Internet is Hostile: Building a More Resilient Network]]></title>
            <link>https://blog.cloudflare.com/the-internet-is-hostile-building-a-more-resilient-network/</link>
            <pubDate>Tue, 08 Nov 2016 18:56:49 GMT</pubDate>
            <description><![CDATA[ The strength of the Internet is its ability to interconnect all sorts of networks — big data centers, e-commerce websites at small hosting companies, Internet Service Providers (ISP), and Content Delivery Networks (CDN) — just to name a few.  ]]></description>
            <content:encoded><![CDATA[ <p>In a recent <a href="/a-post-mortem-on-this-mornings-incident/">post</a> we discussed how we have been adding resilience to our network.</p><p>The strength of the Internet is its ability to interconnect all sorts of networks — big data centers, <a href="https://www.cloudflare.com/ecommerce/">e-commerce websites</a> at small hosting companies, Internet Service Providers (ISP), and <a href="https://www.cloudflare.com/learning/cdn/what-is-a-cdn/">Content Delivery Networks (CDN)</a> — just to name a few. These networks are either interconnected with each other directly using a dedicated physical fiber cable, through a common interconnection platform called an Internet Exchange (IXP), or they can even talk to each other by simply being on the Internet connected through intermediaries called transit providers.</p><p>The Internet is like the network of roads across a country and navigating roads means answering questions like “How do I get from Atlanta to Boise?” The Internet equivalent of that question is asking how to reach one network from another. For example, as you are reading this on the Cloudflare blog, your web browser is connected to your ISP and packets from your computer found their way across the Internet to Cloudflare’s blog server.</p><p>Figuring out the route between networks is accomplished through a protocol designed 25 years ago (on <a href="http://www.computerhistory.org/atchm/the-two-napkin-protocol/">two napkins</a>) called <a href="https://en.wikipedia.org/wiki/Border_Gateway_Protocol">BGP</a>.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4hf3aJRfh6KcsuPEJfYTSe/27f3c58f9490b6f6d135dafc287ee486/BGP.jpg" />
            
            </figure><p>BGP allows interconnections between networks to change dynamically. It provides an administrative protocol to exchange routes between networks, and allows for withdrawals in the case that a path is no longer viable (when some route no longer works).</p><p>The Internet has become such a complex set of tangled fibers, neighboring routers, and millions of servers that you can be certain there is a server failing or a optical fibre being damaged at any moment. Whether it’s in a datacenter, a trench next to a railroad, or <a href="https://en.wikipedia.org/wiki/2008_submarine_cable_disruption">at the bottom of the ocean</a>. The reality is that the Internet is in a constant state of flux as connections break and are fixed; it’s incredible strength is that it operates in the face of the real world where conditions constantly change.</p><p>While BGP is the cornerstone of Internet routing, it does not provide first class mechanisms to automatically deal with these events, nor does it provide tools to manage quality of service in general.</p><p>Although BGP is able to handle the coming and going of networks with grace, it wasn’t designed to deal with Internet brownouts. One common problem is that a connection enters a state where it hasn’t failed, but isn’t working correctly either. This usually presents itself as packet loss: packets enter a connection and never arrive at their destination. The only solution to these brownouts is active, continuous monitoring of the health of the Internet.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2HIfOhD0jflahkGrgaVNZA/3165e9cdb056961f56364bc86568a508/916142_ddc2fd0140_o.gif" />
            
            </figure><p><a href="https://creativecommons.org/licenses/by/2.0/">CC BY 2.0</a> <a href="https://www.flickr.com/photos/jurvetson/916142/in/photolist-5Gky-6yWE3-e2fQKB-eWnwZ-6wHvD2-dgZcgm-6KosGR-e3Hopo-px8hdd-7ZC1ZE-6Kpc5g-mwSyiS-mwSmbA-9cEASh-jXe2g-mDdk7X-far2ZD-ajTJc1-jVhjV4-fsq7m3-p7tksy-6Dfpax-7mbpjF-8m3K8i-ryoZoC-7wCB5-687rPk-njcKr4-7wzXXn-4EnS8a-2kafd3-tcCu6-tcCti-6V5RaY-pGCHzT-4yzuYg-9uwrFi-d9CeFw-7BfKzq-7Bc244-7Bc15c-7BfQRh-6JRaSd-7Bc1zp-7BfRgJ-k17mn-6JM5pM-q4R4Kw-aBAv3F-7BfNCY">image</a> by <a href="https://www.flickr.com/photos/jurvetson/">Steve Jurvetson</a></p><p>Again, the metaphor of a system of roads is useful. A printed map may tell you the route from one city to another, but it won't tell you where there's a traffic jam. However, modern GPS applications such as Waze can tell you which roads are congested and which are clear. Similarly, Internet monitoring shows which parts of the Internet are blocked or losing packets and which are working well.</p><p>At Cloudflare we decided to deploy our own mechanisms to react to unpredictable events causing these brownouts. While most events do not fall under our jurisdiction — they are “external” to the Cloudflare network — we have to operate a reliable service by minimizing the impact of external events.</p><p>This is a journey of continual improvement, and it can be deconstructed into a few simple components:</p><ul><li><p>Building an exhaustive and consistent view of the quality of the Internet</p></li><li><p>Building a detection and alerting mechanism on top of this view</p></li><li><p>Building the automatic mitigation mechanisms to ensure the best reaction time</p></li></ul>
    <div>
      <h3>Monitoring the Internet</h3>
      <a href="#monitoring-the-internet">
        
      </a>
    </div>
    <p>Having deployed our network in <a href="/amsterdam-to-zhuzhou-cloudflare-global-network/">a hundred locations</a> worldwide, we are in a unique position to monitor the quality of the Internet from a wide variety of locations. To do this, we are leveraging the probing capabilities of our network hardware and have added some extra tools that we’ve built.</p><p>By collecting data from thousands of automatically deployed probes, we have a real-time view of the Internet’s infrastructure: packet loss in any of our transit provider’s backbones, packet loss on Internet Exchanges, or packet loss between continents. It is salutary to watch this real-time view over time and realize how often parts of the Internet fail and how resilient the overall network is.</p><p>Our monitoring data is stored in real-time in our metrics pipeline powered by a mix of open-source software: <a href="http://zeromq.org">ZeroMQ</a>, <a href="https://prometheus.io">Prometheus</a> and <a href="http://opentsdb.net/">OpenTSDB</a>. The data can then be queried and filtered on a single dashboard to give us a clear view of the state of a specific transit provider, or one specific PoP.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/451Jz0B6feZP4SHDMdUm0q/85f6ccc1e0ca94684c2abc4bef3c30d1/loss_1.gif" />
            
            </figure><p>Above we can see a time-lapse of a transit provider having some packet loss issues.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3g8KEnve4KdnOmZqvTWLrD/0666a47829dce22097cbdf4aa68bfdf5/Screenshot-2016-10-30-16.34.45.png" />
            
            </figure><p>Here we see a transit provider having some trouble on the US West Coast on October 28, 2016.</p>
    <div>
      <h3>Building a Detection Mechanism</h3>
      <a href="#building-a-detection-mechanism">
        
      </a>
    </div>
    <p>We didn’t want to stop here. Having a real-time map of Internet quality puts us in a great position to detect problems and create alerts as they unfold. We have defined a set of triggers that we know are indicative of a network issue, which allow us to quickly analyze and repair problems.</p><p>For example, 3% packet loss from Latin America to Asia is expected under normal Internet conditions and not something that would trigger an alert. However, 3% packet loss between two countries in Europe usually indicates a bigger and potentially more impactful problem, and thus will immediately trigger alerts for our Systems Reliability Engineering and Network Engineering teams to look into the issue.</p><p>Sitting between eyeball networks and content networks, it is easy for us to correlate this packet loss with various other metrics in our system, such as difficulty connecting to customer origin servers (which manifest as Cloudflare error 522) or a sudden decrease of traffic from a local ISP.</p>
    <div>
      <h3>Automatic Mitigation</h3>
      <a href="#automatic-mitigation">
        
      </a>
    </div>
    <p>Receiving valuable and actionable alerts is great, however we were still facing the hard to compress time-to-reaction factor. Thankfully in our early years, we’ve learned a lot from DDoS attacks. We’ve learned how to detect and auto-mitigate most attacks with our <a href="/introducing-the-bpf-tools/">efficient automated DDoS mitigation pipeline</a>. So naturally we wondered if we could apply what we’ve learned from DDoS mitigation to these generic internet events? After all, they do exhibit the same characteristics: they’re unpredictable, they’re external to our network, and they can impact our service.</p><p>The next step was to correlate these alerts with automated actions. The actions should reflect what an on-call network engineer would have done given the same information. This includes running some important checks: is the packet loss really external to our network? Is the packet loss correlated to an actual impact? Do we currently have enough capacity to reroute the traffic? When all the stars align, we know we have a case to perform some action.</p><p>All that said, automating actions on network devices turns out to be more complicated than one would imagine. Without going into too much detail, we struggled to find a common language to talk to our equipment with because we’re a multi-vendor network. We decided to contribute to the brilliant open-source project <a href="https://github.com/napalm-automation/napalm">Napalm</a>, coupled it with the automation framework <a href="https://saltstack.com/">Salt</a>, and <a href="http://nanog.org/meetings/abstract?id=2951">and improved it to bring us the features we needed</a>.</p><p>We wanted to be able to perform actions such as configuring probes, retrieving their data, and managing complex BGP neighbor configuration regardless of the network device a given PoP was using. With all these features put together into an automated system, we can see the impact of actions it has taken:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7BUEy4XCIg71TuqUQvJqo1/3b49d0cee7acff31949542ad2e3f3665/Screenshot-2016-10-31-11.04.44.png" />
            
            </figure><p>Here you can see one of our transit provider having a sudden problem in Hong-Kong. Our system automatically detects the fault and takes the necessary action, which is to disable this link for our routing.</p><p>Our system keeps improving every day, but it is already running at a high pace and making immediate adjustments across our network to <a href="https://www.cloudflare.com/solutions/ecommerce/optimization/">optimize performance</a> every single day.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6d2bX0eRLMj91oM9LrN2Ny/21f516b784e01bcdb7326cf3037e3962/Screenshot-2016-10-31-14.31.19-2.png" />
            
            </figure><p>Here we can see actions taken during 90 days of our mitigation bot.</p><p>The impact of this is that we’ve managed to make the Internet perform better for our customers and reduce the number of errors that they'd see if they weren't using Cloudflare. One way to measure this is how often we're unable to reach a customer's origin. Sometimes origins are completely offline. However, we are increasingly at a point where if an origin is reachable we'll find a path to reach it. You can see the effects of our improvements over the last year in the graph below.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1Xhgv1pZz3h5ql0xp26nv9/d6493cdf3157ec25d3fcf3708cd6c561/522_year-1.png" />
            
            </figure>
    <div>
      <h3>The Future</h3>
      <a href="#the-future">
        
      </a>
    </div>
    <p>While we keep improving this resiliency pipeline every day, we are looking forward to deploying some new technologies to streamline it further: <a href="http://movingpackets.net/2016/01/11/99-problems-and-configuration-and-telemetry-aint-two/">Telemetry</a> will permit a more real-time collection of our data by moving from a pull model to a push model, and new automation languages like <a href="http://www.openconfig.net/">OpenConfig</a> will unify and simplify our communication with network devices. We look forward to deploying these improvements as soon as they are mature enough for us to release.</p><p>At Cloudflare our mission is to help build a better internet. The internet though, by its nature and size, is in constant flux — breaking down, being added to, and being repaired at almost any given moment — meaning services are often interrupted and traffic is slowed without warning. By enhancing the reliability and resiliency of this complex network of networks we think we are one step closer to fulfilling our mission and building a better internet.</p> ]]></content:encoded>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[Mitigation]]></category>
            <category><![CDATA[Attacks]]></category>
            <category><![CDATA[Salt]]></category>
            <category><![CDATA[Network]]></category>
            <guid isPermaLink="false">1YLXNdbFOfJePGpwFKQuXa</guid>
            <dc:creator>Jérôme Fleury</dc:creator>
        </item>
        <item>
            <title><![CDATA[How Cloudflare's Architecture Allows Us to Scale to Stop the Largest Attacks]]></title>
            <link>https://blog.cloudflare.com/how-cloudflares-architecture-allows-us-to-scale-to-stop-the-largest-attacks/</link>
            <pubDate>Wed, 26 Oct 2016 12:59:29 GMT</pubDate>
            <description><![CDATA[ The last few weeks have seen several high-profile outages in legacy DNS and DDoS-mitigation services due to large scale attacks. Cloudflare's customers have, understandably, asked how we are positioned to handle similar attacks. ]]></description>
            <content:encoded><![CDATA[ <p>The last few weeks have seen several high-profile outages in legacy <a href="https://www.cloudflare.com/learning/dns/what-is-dns/">DNS</a> and DDoS-mitigation services due to large scale attacks. Cloudflare's customers have, understandably, asked how we are positioned to handle similar attacks.</p><p>While there are limits to any service, including Cloudflare, we are well architected to withstand these recent attacks and continue to scale to stop the larger attacks that will inevitably come. We are, <a href="https://twitter.com/MiraiAttacks">multiple times per day</a>, mitigating the very botnets that have been in the news. Based on the attack data that has been released publicly, and what has been shared with us privately, we have been successfully mitigating attacks of a similar scale and type without customer outages.</p><p>I thought it was a good time to talk about how Cloudflare's architecture is different than most legacy DNS and DDoS-mitigation services and how that's helped us keep our customers online in the face of these extremely high volume attacks.</p>
    <div>
      <h3>Analogy: How Databases Scaled</h3>
      <a href="#analogy-how-databases-scaled">
        
      </a>
    </div>
    <p>Before delving into our architecture, it's worth taking a second to think about another analogous technology problem that is better understood: scaling databases. From the mid-1980s, when relational databases started taking off, through the early 2000s the way companies thought of scaling their database was by buying bigger hardware. The game was: buy the biggest database server you could afford, start filling it with data, and then hope a newer, bigger server you could afford was released before you ran out of room. Hardware companies responded with more and more exotic, database-specific hardware.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/37mOkqUk3hRkHBk6ftT82w/8482b91de1486fc610c94a2053e5fe43/IBM-z13-Mainframe-1.png" />
            
            </figure><p>Meet the IBM z13 mainframe (source: IBM)</p><p>At some point, the bounds of a box couldn't contain all the data some organizations wanted to store. Google is a famous example. Back when the company was a startup, they didn't have the resources to purchase the largest database servers. Nor, even if they did, could the largest servers store everything they wanted to index — which was, literally, everything.</p><p>So, rather than going the traditional route, Google wrote software that allowed many cheap, commodity servers to work together as if they were one large database. Over time, as Google developed more services, the software became efficient at distributing load across all the machines in Google's network to maximize utilization of network, compute, and storage. And, as Google's needs grew, they just added more commodity servers — allowing them to linearly scale resources to meet their needs.</p>
    <div>
      <h3>Legacy DNS and DDoS Mitigation</h3>
      <a href="#legacy-dns-and-ddos-mitigation">
        
      </a>
    </div>
    <p>Compare this with the way legacy DNS and DDoS mitigation services mitigate attacks. Traditionally, the way to stop an attack was to buy or build a big box and use it to filter incoming traffic. If you were to dig into the technical details of most legacy DDoS mitigation service vendors you'd find hardware from companies like Cisco, Arbor Networks, and Radware clustered together into so-called "scrubbing centers."</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4ze9dBcN6b89VeM2HRFy16/b1d09c73edfec469752f05eccce30b95/1280px-WWTP_Antwerpen-Zuid.jpg" />
            
            </figure><p><a href="http://creativecommons.org/licenses/by-sa/3.0">CC BY-SA 3.0</a> sewage treatment <a href="https://en.wikipedia.org/wiki/Wastewater_treatment#/media/File:WWTP_Antwerpen-Zuid.jpg">image</a> by <a href="https://commons.wikimedia.org/wiki/User:Annabel">Annabel</a></p><p>Just like in the old database world, there were tricks to get these behemoth mitigation boxes to (sort of) work together, but they were kludgy. Often the physical limits of the number of packets that a single box could absorb became the effective limit on the total volume that could be mitigated by a service provider. And, in very large DDoS attacks, much of the attack traffic will never reach the scrubbing center because, with only a few locations, upstream ISPs become the bottleneck.</p><p>The expense of the equipment meant that it is not cost effective to distribute scrubbing hardware broadly. If you were a DNS provider, how often would you really get attacked? How could you justify investing in expensive mitigation hardware in every one of your data centers? Even if you were a legacy DDoS vendor, typically your service was only provisioned when a customer came under attack so it never made sense to have capacity much beyond a certain margin over the largest attack you'd previously seen. It seemed rational that any investment beyond that was a waste, but that conclusion is proving ultimately fatal to the traditional model.</p>
    <div>
      <h3>The Future Doesn't Come in a Box</h3>
      <a href="#the-future-doesnt-come-in-a-box">
        
      </a>
    </div>
    <p>From the beginning at Cloudflare, we saw our infrastructure much more like how Google saw their database. In our early days, the traditional DDoS mitigation hardware vendors tried to pitch us to use their technology. We even considered building mega boxes ourselves and using them just to scrub traffic. It seemed like a fascinating technical challenge, but we realized that it would never be a scalable model.</p><p>Instead, we started with a very simple architecture. Cloudflare's first racks had only three components: router, switch, server. Today we’ve made them even simpler, often dropping the router entirely and using switches that can also handle enough of the routing table to route packets over the geographic region the data center serves.</p><p>Rather than using load balancers or dedicated mitigation hardware, which could become bottlenecks in an attack, we wrote software that uses BGP, the fundamental routing protocol of the Internet, to <a href="/cloudflares-architecture-eliminating-single-p/">distribute load geographically and also within each data center in our network</a>. Critical to our model: every server in every rack is able to answer every type of request. Our software dynamically allocates load based on what is needed for a particular customer at a particular time. That means that we automatically spread load across literally tens of thousands of servers during large attacks.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/372zDV5K3g5yUvtfcSmrPv/e74c8e3abf85557d3c71ac35d4ca1d74/Graphen.jpg" />
            
            </figure><p>Graphene: a simple architecture that’s 100 times stronger than the best steel (credit: <a href="https://commons.wikimedia.org/wiki/File:Graphen.jpg">Wikipedia</a>)</p><p>It has also meant that we can cost-effectively continue to invest in our network. If Frankfurt needs 10 percent more capacity, we can ship it 10 percent more servers rather than having to make the step-function decision of whether to buy or build another Colossus Mega Scrubber™ box.</p><p>Since every core in every server in every data center can help mitigate attacks, it means that with each new data center we bring online we get better and better at stopping attacks nearer the source. In other words, the solution to a massively distributed botnet is a <a href="https://www.cloudflare.com/network/">massively distributed network</a>. This is actually how the Internet was meant to work: distributed strength, not focused brawn within a few scrubbing locations.</p>
    <div>
      <h3>How We Made DDoS Mitigation Essentially Free</h3>
      <a href="#how-we-made-ddos-mitigation-essentially-free">
        
      </a>
    </div>
    <p>The efficient use of resources isn't only with capital expenditures but also with operating expenditures. Because we use the same equipment and networks to provide all the functions of Cloudflare, we rarely have any additional bandwidth costs associated with stopping an attack. Bear with me for a second, because, to understand this, you need to understand a bit about how we buy bandwidth.</p><p>We pay for bandwidth from transit providers on an aggregated basis billed monthly at the 95th percentile of the greater of ingress vs. egress. Ingress is just network speak for traffic being sent into our network. Egress is traffic being sent out from our network.</p><p>In addition to being a DDoS mitigation service, Cloudflare also offers other functions including caching. The nature of a cache is that you should always have more traffic going out from your cache than coming in. In our case, during normal circumstances, we have many times more egress (traffic out) than ingress (traffic in).</p><p>Large DDoS attacks drive up our ingress but don't affect our egress. However, even in a <a href="/deep-inside-a-dns-amplification-ddos-attack/">very large attack</a>, it is extremely rare that that ingress exceeds egress. Because we only pay for the greater of ingress vs. egress, and because egress is always much higher than ingress, we effectively have an enormous amount of zero-cost bandwidth with which to soak up attacks.</p><p>As use of our services increases, the amount of capacity to stop attacks increases proportionately. People wonder how we can provide DDoS mitigation at a fixed fee regardless of the size of the attack; the answer is because attacks don't increase the biggest of our unit costs. And, while legacy providers have stated that their offering pro bono DDoS mitigation would cost them millions, we’re able to protect politically and artistically important sites against huge attacks for free through <a href="https://www.cloudflare.com/galileo/">Project Galileo</a> without it breaking the bank.</p>
    <div>
      <h3>Winning the Arms Race</h3>
      <a href="#winning-the-arms-race">
        
      </a>
    </div>
    <p>Cloudflare is the only DNS provider that was designed, from the beginning, to mitigate large scale DDoS attacks. Just as DDoS attacks are by their very nature distributed, Cloudflare’s DDoS mitigation system is distributed across our massive global network.</p><p>There is no doubt that we are in an arms race with attackers. However, we are well positioned technically and economically to win that race. Against most legacy providers, attackers have an advantage: providers' costs are high because they have to buy expensive boxes and bandwidth, while attackers' costs are low because they use hacked devices. That’s why our secret sauce is the software that spreads our load across our massively distributed network of commodity hardware. By keeping our costs low we are able to continue to grow our capacity efficiently and stay ahead of attacks.</p><p>Today, we believe Cloudflare has more capacity to stop attacks than the publicly announced capacity of all our competitors — combined. And we continue to expand, opening nearly a new data center a week. The good news for our customers is that we’ve designed Cloudflare in such a way that we can continue to cost effectively scale our capacity as attacks grow. There are limits to any service, and we remain ever vigilant for new attacks, but we are confident that our architecture is ultimately the right way to stop whatever comes next.</p><p>PS - Want to work at our scale on some of the hardest problems the Internet faces? We’re <a href="https://www.cloudflare.com/join-our-team/">hiring</a>.</p> ]]></content:encoded>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[Attacks]]></category>
            <category><![CDATA[Mitigation]]></category>
            <category><![CDATA[DNS]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Security]]></category>
            <guid isPermaLink="false">2JNf9MOLtjCFtr1DStPMWU</guid>
            <dc:creator>Matthew Prince</dc:creator>
        </item>
        <item>
            <title><![CDATA[Rate Limiting: Live Demo]]></title>
            <link>https://blog.cloudflare.com/traffic-control-live-demo/</link>
            <pubDate>Fri, 30 Sep 2016 19:56:00 GMT</pubDate>
            <description><![CDATA[ Cloudflare helps customers control their own traffic at the edge. One of two products that we introduced to empower customers to do so is Cloudflare Rate Limiting. ]]></description>
            <content:encoded><![CDATA[ <p>Cloudflare helps customers control their own traffic at the edge. One of two <a href="/cloudflare-traffic/">products that we introduced</a> to empower customers to do so is <a href="https://www.cloudflare.com/traffic-control/">Cloudflare Rate Limiting</a>*.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4NtFQrwczwJFS6c5ko0bUk/08f0544ed11c6d802aa08ad234d79a71/speed-limit-10.jpg" />
            
            </figure><p><a href="https://creativecommons.org/licenses/by/2.0/">CC BY 2.0</a> <a href="https://www.flickr.com/photos/brhefele/6553028503/">image</a> by <a href="https://www.flickr.com/photos/brhefele/">Brian Hefele</a></p><p>Rate Limiting allows a customer to rate limit, shape or block traffic based on the rate of requests per client IP address, cookie, authentication token, or other attributes of the request. Traffic can be controlled on a per-URI (with wildcards for greater flexibility) basis giving pinpoint control over a website, application, or API.</p><p>Cloudflare has been <a href="https://en.wikipedia.org/wiki/Eating_your_own_dog_food">dogfooding</a> Rate Limiting to add more granular controls against Layer 7 DOS and brute-force attacks. For example, we've experienced attacks on cloudflare.com from more than 4,000 IP addresses sending 600,000+ requests in 5 minutes to the same URL but with random parameters. These types of attacks send large volumes of HTTP requests intended to bring down our site or to crack login passwords.</p><p>Rate Limiting protects websites and APIs from similar types of bad traffic. By leveraging our massive network, we are able to process and enforce rate limiting near the client, shielding the customer's application from unnecessary load.</p><p>To make this more concrete, let's look at a live demonstration rule for cloudflare.com. Multiple rules may be used and combined to great effect -- this is just a limited example.</p><p>Read on, and then test it yourself.</p>
    <div>
      <h3>Creating the rule</h3>
      <a href="#creating-the-rule">
        
      </a>
    </div>
    <p>Imagine an endpoint that is resource intensive. To maintain availability, we want to protect it from high-volume request rates - like those from an aggressive bot or attacker.</p><p><b>URL</b> <code>*.cloudflare.com/rate-limit-test</code></p><p>Rate Limiting allows for * wildcards to give more flexibility. An API with multiple endpoints might use a pattern of <code>api.example.com/v2/*</code></p><p>With that pattern, all resources under <code>/v2</code> would be protected by the same rule.</p><p><b>Threshold</b>We set this demonstration rule to 10 requests per minute, which is too sensitive for a real web application, but allows a curious user refreshing their browser ten times to see Rate Limiting in action.</p><p><b>Action</b>We set this value to <code>block</code> which means that once an IP addresses triggers a rule, all traffic from that IP address will be blocked at the edge and served with a default 429 HTTP error code.</p><p>Other possible choices include <code>simulate</code> which means no action taken, but analytics would indicate which requests would have been mitigated to help customers evaluate the potential impact of a given rule.</p><p><b>Timeout</b></p><p>This is the duration of the mitigation once the rule has been triggered. In this example, an offending IP address will be blocked for 1 minute.</p><p><b>Response body type</b></p><p>This type was set to <code>HTML</code> in the demo so that Rate Limiting returns a web page to mitigated requests. For an API endpoint, the response body type could return JSON.</p><p><b>Response body</b></p><p>The response body can be anything you want. Refresh the link below 10 times very quickly to see our choice for this demonstration rule.</p><p><a href="https://www.cloudflare.com/rate-limit-test"><b>https://www.cloudflare.com/rate-limit-test</b></a></p>
    <div>
      <h3>Other possible configurations</h3>
      <a href="#other-possible-configurations">
        
      </a>
    </div>
    <p>We could have specified a <b>Method</b>. If we only cared to rate limit POST requests, we could adjust the rule to do so. This rule could be used for a login page where high frequency of POSTs by the same IP is potentially suspicious.</p><p>We also could have specified a <b>Response Code</b>. If we only wanted to rate limit IPs which were consistently failing to authenticate, we could create the rule to trigger only after a certain threshold of 403’s have been served. Once an IP is flagged, perhaps because it was pounding a login endpoint with incorrect credentials, that client IP could be blocked from hitting either that endpoint or the whole site.</p><p>We will expand the matching criteria, such as adding headers or cookies. We will also extend the mitigation options to include CAPTCHA or other challenges. This will give our users even more flexibility and power to protect their websites and API endpoints.</p>
    <div>
      <h3>Early Access</h3>
      <a href="#early-access">
        
      </a>
    </div>
    <p>We'd love to have you try Rate Limiting. Read more and <a href="https://www.cloudflare.com/traffic-control">sign up for Early Access</a>.</p><p>**Note: This post was updated 4/13/17 to reflect the current product name. All references to Traffic Control have been changed to Rate Limiting.*</p> ]]></content:encoded>
            <category><![CDATA[Traffic]]></category>
            <category><![CDATA[Rate Limiting]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[Mitigation]]></category>
            <category><![CDATA[Security]]></category>
            <guid isPermaLink="false">5jRNF30iNz47tFcZ7hvNoY</guid>
            <dc:creator>Timothy Fong</dc:creator>
        </item>
        <item>
            <title><![CDATA[400Gbps: Winter of Whopping Weekend DDoS Attacks]]></title>
            <link>https://blog.cloudflare.com/a-winter-of-400gbps-weekend-ddos-attacks/</link>
            <pubDate>Thu, 03 Mar 2016 02:32:00 GMT</pubDate>
            <description><![CDATA[ Over the last month, we’ve been watching some of the largest distributed denial of service (DDoS) attacks ever seen unfold. As CloudFlare has grown we've brought on line systems capable of absorbing and accurately measuring attacks. ]]></description>
            <content:encoded><![CDATA[ <p>Over the last month, we’ve been watching some of the largest distributed denial of service (DDoS) attacks ever seen unfold. As CloudFlare has grown we've brought on line systems capable of absorbing and <i>accurately measuring</i> attacks. Since we don't need to resort to crude techniques to block traffic we can measure and filter attacks with accuracy. Our systems sort bad packets from good, keep websites online and keep track of attack packet rates and bits per second.</p><p>The current spate of large attacks are all layer 3 (L3) DDoS. Layer 3 attacks consist of a large volume of packets hitting the target network, and the aim is usually to overwhelm the target network hardware or connectivity.</p><p>L3 attacks are dangerous because most of the time the only solution is to acquire large network capacity and buy beefy networking hardware, which is simply not an option for most independent website operators. Or, faced with huge packet rates, some providers simply turn off connections or entirely block IP addresses.</p>
    <div>
      <h3>A Typical Day At CloudFlare</h3>
      <a href="#a-typical-day-at-cloudflare">
        
      </a>
    </div>
    <p>Historically, L3 attacks were the biggest headache for CloudFlare. Over the last two years, we’ve automated almost all of our L3 attack handling and these automatic systems protect CloudFlare customers worldwide 24 hours a day.</p><p>This chart shows our L3 DoS mitigation during the last quarter of 2015:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/38kFVHuPA0mAiJlfI0vINy/fab71bd2587471c6f1714d1cdf98e8e4/Daily-DOS-Events.png" />
            
            </figure><p>The y-axis is the number of individual "DoS events" that we responded to, which is about 20-80 on a typical day. Each of these DoS events are automatically triggered by an attack on one or more of our customers.</p>
    <div>
      <h3>Recent DDoS Attacks Were Larger. Much Larger</h3>
      <a href="#recent-ddos-attacks-were-larger-much-larger">
        
      </a>
    </div>
    <p>Most of the mitigated DoS events are small. When a big attack happens, it usually shows up as a large number of separate events in our system.</p><p>During the last month or so, we’e been dealing (mostly automatically) with very large attacks. Here is the same chart, but including the first quarter of 2016. Notice the scale:</p><p>Over the last month, we’ve been watching some of the largest distributed denial of service (DDoS) attacks ever seen unfold. As CloudFlare has grown we've brought on line systems capable of absorbing and accurately measuring attacks.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2vfkSAklwsTtnkIY3jjkgF/cb7f0270a3d6944b804b7934852b8921/Daily-DOS-Events-in-Feb.png" />
            
            </figure><p>That’s about a 15x increase in individual DoS events. These new attacks are interesting for a couple of reasons. First, the spikes align with the weekends. It seems the attackers are busy with something else during the week. Second, they are targeting a couple of fairly benign websites—this demonstrates that <i>anybody</i> can become the target of a large attack. Third, the overall volume of the attack is enormous.</p><p>Let me elaborate on this last point.</p>
    <div>
      <h3>BGP Black Holes</h3>
      <a href="#bgp-black-holes">
        
      </a>
    </div>
    <p>When operating at a smaller scale, it’s not unusual for a DDoS attack to overwhelm the capacity of the target network. This causes network congestion and forces the operators of the attacked network to <a href="https://en.wikipedia.org/wiki/Black_hole_(networking)">black hole</a> the attacked IP addresses, pretty much removing them from the Internet.</p><p>After that happens, it’s impossible to report the volume of the attack. The attack traffic will disappear in a “black hole” and is invisible to the operators of the attacked network. This is why reporting on big attacks is difficult—with BGP blackholing, it’s impossible to assess the true scale of the malicious traffic.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4vmypmoCP53Cc9fTfXlnf2/8c6fa9829fabc44b6201ee5450303887/2844659652_308ecc230f_z.jpg" />
            
            </figure><p><a href="https://creativecommons.org/licenses/by/2.0/">CC BY 2.0</a> <a href="https://www.flickr.com/photos/mwichary/2844659652/in/photolist-5knCd5-5knH5b-6mJq7x-4HaGyf-6mJvLH-4HrEVJ-81DPyL-6mNyCo-6Va71U-81AEVR-pPFFHL-4w3qa7-6V63qT-6mJkh2-6mNCeJ-36k8qA-pPwRs3-4M2GXi-4HrE6u-4HsdeB-4JD8hf-4JD6nw-4QqseF-6mJoog-pNjmq5-4HJQz6-qa29gR-3h8GjF-J8M2b-4Hwxgh-ex6kUp-4HJMaX-4HP4kb-8DSYWk-nqEz3T-4G8kLK-4HsiHa-5knEos-6mJm94-eaUcwV-6mNvCb-4HNSp3-4HscFR-4JyQnR-4HJG9K-4HJRx6-5knD6Q-5kipKe-4x3H5P-6mNF7E">image</a> by <a href="https://www.flickr.com/photos/mwichary/">Marcin Wichary</a></p><p>We’ve encountered very large attacks in the past, and we didn’t often have to resort to blackholing. This allows us to report in detail on the attack volume we see. Fortunately, the same is true this time around—the size of our network and the efficiency of our systems allowed us to simply absorb the attacks. This is the only reason we’re able to provide the following metrics.</p>
    <div>
      <h3>How Big Was This DDoS Attack?</h3>
      <a href="#how-big-was-this-ddos-attack">
        
      </a>
    </div>
    <p>DDoS attacks are measured in two dimensions: the number of malicious packets per second (pps) and the attack bandwidth in bits per second (bps).</p>
    <div>
      <h4>Rate of Packets (pps)</h4>
      <a href="#rate-of-packets-pps">
        
      </a>
    </div>
    <p>The packets per second metric is important, since the processing power required on a router raises proportionally with the pps count. Whenever an attack overwhelms a router’s processing power, we will see a frantic error message like this:</p>
            <pre><code>PROBLEM/CRITICAL:
edge21.sin01.cloudflare.cc router PFE hardware drops
Info cell drops are increasing at a rate of 210106.35/s.
Fabric drops are increasing at a rate of 329678.81/s.</code></pre>
            <p>It’s not uncommon for attacks against CloudFlare to reach 100Mpps. The recent attacks were larger and peaked at 180Mpps. Here’s a chart of the attack packets per second that we received over the last month:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2vPGAQuIemYVcO1CDr0sPH/a96b8dde9622bd782ab59cf7ea57cf33/attack-pps.png" />
            
            </figure><p>There aren’t many networks in the world that could sustain this rate of attack packets, but our hardware was able to keep up with all the malicious traffic. Our networking team does an excellent job at keeping CloudFlare’s networking hardware up to the task.</p>
    <div>
      <h4>Volume in bits (bps)</h4>
      <a href="#volume-in-bits-bps">
        
      </a>
    </div>
    <p>The other interesting parameter is size of the attack in gigabits per second. Some large DDoS attacks attempt to saturate the network capacity (bps), rather than router processing power (pps). If that happens, we see alerts like this:</p>
            <pre><code>PROBLEM/CRITICAL:
edge112.tel01.cloudflare.cc router interfaces
ae-1/2/2 (HardLayer) in: 81092mbps (81.10%)</code></pre>
            <p>We saw this warning some time ago when a link got quite full. In fact, in recent weeks a couple of our fat 100Gbps links got quite busy, hitting a ceiling at 77Gbps on inbound:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/12hbZzVMslMyjDBIINRKGn/648750ef9a8610e093581ee407f7eb8b/76g-ceiling.png" />
            
            </figure><p>We speculate the attack against this datacenter was larger than 77Gbps, but that congestion occurred on the other side of the network—close to the attackers. We’re investigating this with our Internet providers.</p><p>The recent attacks peaked at around 400Gbps of aggregate inbound traffic. The peak wasn't a one-off spike, it lasted for a couple of hours!</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3R7AfavSaYuoe8bk7UFRDc/07ba4f9504f567bad8de9b5464f1b901/attack-bps.png" />
            
            </figure><p>Even during the peak attack traffic, our network wasn’t congested and systems operated normally. Our automatic mitigation software was able to sort the good from the bad packets.</p>
    <div>
      <h3>Final Words</h3>
      <a href="#final-words">
        
      </a>
    </div>
    <p>Over the past weeks, we encountered a series of large DDoS attacks. In peak, our systems reported over 400Gbps of incoming traffic, which is amongst the largest by aggregate volume that we’ve ever seen. The attack was mostly absorbed by our automatic mitigation systems.</p><p>With each attack, we’re improving our automatic attack mitigation system. In addition, our network team constantly keeps an eye on the network, working to identify new bottlenecks. Without our automatic mitigation system and network team, dealing with attacks of this size would not be possible.</p><p>You might think that as CloudFlare grows, the relative scale of attacks will shrink in comparison to legitimate traffic. The funny thing is, the opposite is true. Every time we upgrade our network capacity, we see larger attacks because we don’t have to resort to blackholing. With more capacity, we’re able to withstand and accurately measure even bigger attacks.</p><p>Interested in dealing with the largest DDoS attacks in the world? <a href="https://www.cloudflare.com/join-our-team/">We’re hiring</a> in San Francisco, London, and Singapore.</p> ]]></content:encoded>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[Attacks]]></category>
            <category><![CDATA[Mitigation]]></category>
            <category><![CDATA[Reliability]]></category>
            <guid isPermaLink="false">6xHWTVYf4Bv4uWSAKl2uBr</guid>
            <dc:creator>Marek Majkowski</dc:creator>
        </item>
        <item>
            <title><![CDATA[Announcing Virtual DNS: DDoS Mitigation and Global Distribution for DNS Traffic]]></title>
            <link>https://blog.cloudflare.com/announcing-virtual-dns-ddos-mitigation-and-global-distribution-for-dns-traffic/</link>
            <pubDate>Tue, 10 Mar 2015 12:59:09 GMT</pubDate>
            <description><![CDATA[ It’s 9am and CloudFlare has already mitigated three billion malicious requests for our customers today. Six out of every one hundred requests we see are malicious, and increasingly, more of those bad requests are targeting DNS nameservers.

 ]]></description>
            <content:encoded><![CDATA[ <p></p><p>It’s 9am and CloudFlare has already mitigated three billion malicious requests for our customers today. Six out of every one hundred requests we see are malicious, and increasingly, more of those bad requests are targeting DNS nameservers.</p><p>DNS is the phone book of the Internet and fundamental to the usability of the web, but is also a serious weak link in Internet security. One of the ways CloudFlare is trying to make DNS more secure is by implementing <a href="/dnssec-an-introduction/">DNSSEC</a>, cryptographic authentication for DNS responses. Another way is <a href="https://www.cloudflare.com/virtual-dns">Virtual DNS</a>, the authoritative DNS proxy service we are introducing today.</p><p>Virtual DNS provides CloudFlare’s DDoS mitigation and global distribution to DNS nameservers. DNS operators need performant, resilient infrastructure, and we are offering ours, the <a href="http://dnsperf.com">fastest</a> of any providers, to any organization’s DNS servers.</p><p>Many organizations have legacy DNS infrastructure that is difficult to change. The hosting industry is a key example of this. A host may have given thousands of clients a set of nameservers but now realize that they don't have the performance or defensibility that their clients need.</p><p>Virtual DNS means that the host can get the benefits of a global, modern DNS infrastructure without having to contact every customer and get them to update their name servers.</p><p>With legacy infrastructure blocking a host from deploying modern cloud-based security services, DNS providers, even if they are securing their customers' websites, may have a massive single point of failure: their own nameservers.</p>
    <div>
      <h3>A Quick Brief on DDoS</h3>
      <a href="#a-quick-brief-on-ddos">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6HxpoxFqnYJrkoSvaqgfvn/8eb71be79ca601b0c48f704840001f57/Diner-Dash-apk-mod.jpg" />
            
            </figure><p><i>Source: </i><a href="http://androidcorps.com"><i>Android Corps</i></a></p><p>DDoS stands for <a href="/ddos-prevention-protecting-the-origin/">Distributed Denial of Service</a>, and works much like the 2004 video game Diner Dash. In each case, the server is expected to handle more and more requests (for food, in the case of Diner Dash, and for data, in the case of web servers) until the server is so overwhelmed that it invariably fails to answer at all.</p><p>A successful DDoS attack on a provider's nameservers will take every website with DNS records on those nameservers offline. For larger providers, this could be hundreds of thousands or millions of websites depending on those nameservers.</p>
    <div>
      <h3>Introducing Virtual DNS</h3>
      <a href="#introducing-virtual-dns">
        
      </a>
    </div>
    <p>Today, CloudFlare introduces Virtual DNS, leveraging its global DNS and proxying infrastructure to provide performance and security for any nameserver by acting as authoritative for its domains.</p><p>With Virtual DNS, DNS queries for the provider's records are responded to by the nearest CloudFlare edge location. If the proper DNS response is available in CloudFlare's cache, CloudFlare will return the response to the visitor, saving bandwidth at the origin nameserver.</p><p>If the DNS response is not available in cache, CloudFlare will query one of the provider's nameservers in the background to fetch the DNS response and send it back to the visitor. Simultaneously, that response will be temporarily cached on CloudFlare to be automatically returned when the next query for that record comes along. The caching of records at the edge makes CloudFlare one of the <a href="http://dnsperf.com">fastest DNS providers</a> worldwide.</p><p>To protect against attacks, malicious requests to the nameservers will be identified and blocked at CloudFlare’s edge before those requests ever make it to the provider's DNS infrastructure.</p><p>A simple representation of this communication can be seen below:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3WozKJkxrcKaEjvATEhlJw/25af330a0de39158fd5823172cdbcd0c/virtual-dns-03-2.png" />
            
            </figure>
    <div>
      <h3>Additional Security</h3>
      <a href="#additional-security">
        
      </a>
    </div>
    <p>Virtual DNS provides two additional layers of security through the CloudFlare proxy:</p><p>First, if for some reason the origin nameserver is knocked offline and the DNS records are cached on CloudFlare, CloudFlare will keep the records in the cache and will continue to answer for them, providing DNS answers even when the origin nameserver is unreachable, and automatically checking in the background for the origin's return or failing over to designated origins.</p><p>Secondly, Virtual DNS masks the true origin IP addresses of the provider's nameservers behind CloudFlare’s IP addresses. Visitors and/or attackers only see CloudFlare’s IP addresses when requesting answers, keeping customer nameservers safe from being targeted by attackers.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5wYqf3gCLp3ySKp2YPgMfY/0b19268a3a775342147e0c4faaad2fed/virtual-dns-only.png" />
            
            </figure>
    <div>
      <h3>Virtual DNS Rollout</h3>
      <a href="#virtual-dns-rollout">
        
      </a>
    </div>
    <p>We are currently rolling out Virtual DNS support. Organizations interested in enabling Virtual DNS should <a href="https://www.cloudflare.com/enterprise-service-request">contact our sales team</a>.</p><p>Over the past year, we’ve been testing the product with hosting providers, <a href="https://www.cloudflare.com/learning/dns/glossary/what-is-a-domain-name-registrar/">registrars</a> and some enterprises with very positive results.</p><p>DigitalOcean, for example, put their nameservers behind Virtual DNS in July 2014, and is now supporting 10K requests per second of 100% clean traffic. They <a href="https://www.cloudflare.com/case-studies-digital-ocean">report</a> that they haven’t seen malicious traffic reach their nameservers since.</p><p>Maintaining custom DNS infrastructure is hard and expensive, and Virtual DNS makes it more accessible. Any enterprise can use CloudFlare Virtual DNS to deliver answers to the edge, with high performance anywhere in the world, saving bandwidth costs by caching answers, and stopping malicious traffic.</p> ]]></content:encoded>
            <category><![CDATA[DNS]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Mitigation]]></category>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[Attacks]]></category>
            <guid isPermaLink="false">2K4OacUkqKlVAQrlX7d5tL</guid>
            <dc:creator>Dani Grant</dc:creator>
        </item>
        <item>
            <title><![CDATA[Understanding and mitigating NTP-based DDoS attacks]]></title>
            <link>https://blog.cloudflare.com/understanding-and-mitigating-ntp-based-ddos-attacks/</link>
            <pubDate>Thu, 09 Jan 2014 16:00:00 GMT</pubDate>
            <description><![CDATA[ Over the last couple of weeks you may have been hearing about a new tool in the DDoS arsenal: NTP-based attacks. These have become popular recently and caused trouble for some gaming web sites and service providers. ]]></description>
            <content:encoded><![CDATA[ <p>Over the last couple of weeks you may have been hearing about a new tool in the DDoS arsenal: NTP-based attacks. These have become popular recently and caused trouble for some gaming web sites and service providers. We'd long thought that <a href="https://github.com/cloudflare/jgc-talks/blob/master/Virus_Bulletin/Secure_2013_and_Virus_Bulletin_2013/CloudFlare%20JGC%20-%20Secure%202013%20and%20Virus%20Bulletin%202013.pdf">NTP might become a vector for DDoS attacks</a> because, like DNS, it is a simple UDP-based protocol that can be persuaded to return a large reply to a small request. Unfortunately, that prediction has come true.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5srV8xsezFIDmoDes2VOPh/6f097751cbd6d7277a5e67766fd6a8bb/speak_clock_1.jpg" />
            
            </figure><p>This blog post explains how an NTP-based attack works and how web site owners can help mitigate them. CloudFlare defends web sites against NTP based attacks, but it's best to stem the flow of NTP-based DDoS by making simple configuration changes to firewalls and NTP servers. Doing so makes the web safer for everyone.</p>
    <div>
      <h3>DNS Reflection is so 2013</h3>
      <a href="#dns-reflection-is-so-2013">
        
      </a>
    </div>
    <p>We've written in the past about <a href="/65gbps-ddos-no-problem">DNS-based reflection and amplification attacks</a> and NTP-based attacks use similar techniques, just a different protocol.</p><p>A reflection attack works when an attacker can send a packet with a forged source IP address. The attacker sends a packet apparently <i>from</i> the intended victim to some server on the Internet that will reply immediately. Because the source IP address is forged, the remote Internet server replies and sends data to the victim.</p><p>That has two effects: the actual source of the attack is hidden and is very hard to trace, and, if many Internet servers are used, an attack can consist of an overwhelming number of packets hitting a victim from all over the world.</p><p>But what makes reflection attacks really powerful is when they are also amplified: when a small forged packet elicits a large reply from the server (or servers). In that case, an attacker can send a small packet "from" a forged source IP address and have the server (or servers) send large replies to the victim.</p><p>Amplification attacks like that result in an attacker turning a small amount of bandwidth coming from a small number of machines into a massive traffic load hitting a victim from around the Internet. Until recently the most popular protocol for amplification attacks was DNS: a small DNS query looking up the IP address of a domain name would result in a large reply.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5AUH2Pw2eHluzo2Azh5WWJ/4093ce97e0bdc109f0e86d1847ce002a/illustration-amplification-attack-ph3.png" />
            
            </figure><p>For DNS the amplification factor (how much larger a reply is than a request) is 8x. So an attacker can generate an attack 8x larger than the bandwidth they themselves have access to. For example, an attacker controlling 10 machines with 1Gbps could generate an 80Gbps DNS amplification attack.</p><p>In the past, we've seen one attack that used SNMP for amplification: it has a factor of 650x! Luckily, there are few open SNMP servers on the Internet and SNMP usually requires authentication (although many are poorly secured). That makes SNMP attacks relatively rare.</p><p>The new kid on the block today is NTP.</p>
    <div>
      <h3>Network Time Protocol attacks: as easy as (UDP port) 123</h3>
      <a href="#network-time-protocol-attacks-as-easy-as-udp-port-123">
        
      </a>
    </div>
    <p>NTP is the <a href="https://en.wikipedia.org/wiki/Network_Time_Protocol">Network Time Protocol</a> that is used by machines connected to the Internet to set their clocks accurately. For example, the address time.euro.apple.com seen in the clock configuration on my Mac is actually the address of an NTP server run by Apple.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6EKI3a5J7dJs5P7nPLUZjA/d7194af903f92f223718795617112818/Screen_Shot_2014-01-09_at_11.33.15_AM.png" />
            
            </figure><p>My Mac quietly synchronizes with that server to keep its clock accurate. And, of course, NTP is not just used by Macs: it is widely used across the Internet by desktops, servers and even phones to keep their clocks in sync.</p><p>Unfortunately, the simple UDP-based NTP protocol is prone to amplification attacks because it will reply to a packet with a spoofed source IP address and because at least one of its built in commands will send a long reply to a short request. That makes it ideal as a DDoS tool.</p><p>NTP contains a command called monlist (or sometimes MON_GETLIST) which can be sent to an NTP server for monitoring purposes. It returns the addresses of up to the last 600 machines that the NTP server has interacted with. This response is much bigger than the request sent making it ideal for an amplification attack.</p><p>To get an idea of how much larger, I used the <a href="http://linuxcommand.org/man_pages/ntpdc1.html">ntpdc</a> command to send a monlist command to a randomly chosen open NTP server on the Internet. Here are the request and response packets captured with <a href="http://www.wireshark.org">Wireshark</a>.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6rrzt9EWPhDbg0F1uQkWvH/98cc24692bf708d385e729851e87ec0c/Screen_Shot_2014-01-09_at_11.47.29_AM.png" />
            
            </figure><p>At the command line I typed</p><p>ntpdc –c monlist 1xx.xxx.xxx.xx9</p><p>to send the MON_GETLIST command to the server at 1xx.xxx.xxx.xx9. The request packet is 234 bytes long. The response is split across 10 packets totaling 4,460 bytes. That's an amplification factor of 19x and because the response is sent in many packets an attack using this would consume a large amount of bandwidth and have a high packet rate.</p><p>This particular NTP server only had 55 addresses to tell me about. Each response packet contains 6 addresses (with one short packet at the end), so a busy server that responded with the maximum 600 addresses would send 100 packets for a total of over 48k in response to just 234 bytes. That's an amplification factor of 206x!</p><p>An attacker, armed with a list of open NTP servers on the Internet, can easily pull off a DDoS attack using NTP. And NTP servers aren't hard to find. Common tools like Metasploit and NMAP have had modules capable of <a href="http://nmap.org/nsedoc/scripts/ntp-monlist.html">identifying NTP servers</a> that support monlist for a long time. There's also the <a href="http://openntpproject.org/">Open NTP Project</a> which aims to highlight open NTP servers and get them patched.</p>
    <div>
      <h3>Don't be part of the problem</h3>
      <a href="#dont-be-part-of-the-problem">
        
      </a>
    </div>
    <p>If you're running a normal NTP program to set the time on your server and need to know how to configure it to protect your machine, I suggest Team Cymru's excellent page on a <a href="http://www.team-cymru.org/ReadingRoom/Templates/secure-ntp-template.html">Secure NTP Template</a>. It shows how to secure an NTP client on Cisco IOS, Juniper JUNOS or using iptables on a Linux system.</p><p>If you're running an ntpd server that needs to be on the public Internet then it's vital that it's upgraded to at least version 4.2.7p26 (more details in <a href="http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-5211">CVE-2013-5211</a>). The vulnerability was classed as a bug in the ntpd bug database (issue <a href="http://bugs.ntp.org/show_bug.cgi?id=1532">1532</a>).</p><p>If you are running an ntpd server and still need something like monlist there's the mrulist command (see issue <a href="http://bugs.ntp.org/show_bug.cgi?id=1531">1531</a>) which now requires a nonce (a proof that the command came from the IP address in the UDP packet).</p><p>Neither of these changes are recent, ntpd v4.2.7p26 was released in March 24, 2010, so upgrading doesn't require using bleeding edge code.</p><p>If you're running a network (or are a service provider) then it's vital that you implement <a href="http://tools.ietf.org/html/bcp38">BCP-38</a>. Implementation of it (and the related BCP-84) would eliminate source IP spoofed attacks of all kinds (DNS, NTP, SNMP, ...).</p>
    <div>
      <h3>Further Reading</h3>
      <a href="#further-reading">
        
      </a>
    </div>
    <p>If you're interested in further background on reflection and amplification attacks, take a look at my October 2013 presentation "How to launch and defend against a DDoS".</p><p><a href="https://www.slideshare.net/jgrahamc/cloud-flarejgc-secure2013andvirusbulletin2013">How to launch and defend against a DDoS</a> from <a href="http://www.slideshare.net/jgrahamc"><b>jgrahamc</b></a></p>
    <div>
      <h3>Footnote</h3>
      <a href="#footnote">
        
      </a>
    </div>
    <p>The black and white photograph at the top of this blog post shows the UK's original <a href="http://en.wikipedia.org/wiki/Speaking_clock">speaking clock</a> and the original voice of the clock <a href="http://en.wikipedia.org/wiki/Jane_Cain">Jane Cain</a>. A common way to synchronize clocks and watches was to telephone the speaking clock to get the precise time.</p><p>Geeks like me will be amused that the NTP UDP port for time synchronization is 123 and that the telephone number of the UK speaking clock is also 123. Even today dialing 123 in the UK gets you the time.</p> ]]></content:encoded>
            <category><![CDATA[Attacks]]></category>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[Mitigation]]></category>
            <guid isPermaLink="false">71WUSJlXSKsrEbQfh4IuCR</guid>
            <dc:creator>John Graham-Cumming</dc:creator>
        </item>
        <item>
            <title><![CDATA[The DDoS that almost broke the Internet]]></title>
            <link>https://blog.cloudflare.com/the-ddos-that-almost-broke-the-internet/</link>
            <pubDate>Wed, 27 Mar 2013 16:35:00 GMT</pubDate>
            <description><![CDATA[ The New York Times this morning published a story about the Spamhaus DDoS attack and how CloudFlare helped mitigate it and keep the site online. The Times calls the attack the largest known DDoS attack ever on the Internet. ]]></description>
            <content:encoded><![CDATA[ <p></p><p>The <i>New York Times</i> this morning published a story about the <a href="http://www.nytimes.com/2013/03/27/technology/internet/online-dispute-becomes-internet-snarling-attack.html?">Spamhaus DDoS attack and how CloudFlare helped mitigate it and keep the site online</a>. The <i>Times</i> calls the attack the largest known DDoS attack ever on the Internet. We <a href="/the-ddos-that-knocked-spamhaus-offline-and-ho">wrote about the attack last week</a>. At the time, it was a large attack, sending 85Gbps of traffic. Since then, the attack got much worse. Here are some of the technical details of what we've seen.</p>
    <div>
      <h3>Growth Spurt</h3>
      <a href="#growth-spurt">
        
      </a>
    </div>
    <p>On Monday, March 18, 2013 Spamhaus contacted CloudFlare regarding an attack they were seeing against their website <a href="http://www.spamhaus.org">spamhaus.org</a>. They signed up for CloudFlare and we quickly mitigated the attack. The attack, initially, was approximately 10Gbps generated largely from open DNS recursors. On March 19, the attack increased in size, peaking at approximately 90Gbps. The attack fluctuated between 90Gbps and 30Gbps until 01:15 UTC on on March 21.</p><p>The attackers were quiet for a day. Then, on March 22 at 18:00 UTC, the attack resumed, peaking at 120Gbps of traffic hitting our network. As we discussed in the previous blog post, CloudFlare uses Anycast technology which spreads the load of a distributed attack across all our data centers. This allowed us to mitigate the attack without it affecting Spamhaus or any of our other customers. The attackers ceased their attack against the Spamhaus website four hours after it started.</p><p>Other than the scale, which was already among the largest <a href="https://www.cloudflare.com/learning/ddos/what-is-a-ddos-attack/">DDoS attacks</a> we've seen, there was nothing particularly unusual about the attack to this point. Then the attackers changed their tactics. Rather than attacking our customers directly, they started going after the network providers CloudFlare uses for bandwidth. More on that in a second, first a bit about how the Internet works.</p>
    <div>
      <h3>Peering on the Internet</h3>
      <a href="#peering-on-the-internet">
        
      </a>
    </div>
    <p>The "inter" in Internet refers to the fact that it is a collection of independent networks connected together. CloudFlare runs a network, Google runs a network, and bandwidth providers like Level3, AT&amp;T, and Cogent run networks. These networks then interconnect through what are known as peering relationships.</p><p>When you surf the web, your browser sends and receives <a href="https://www.cloudflare.com/learning/network-layer/what-is-a-packet/">packets of information</a>. These packets are sent from one network to another. You can see this by running a traceroute. Here's one from <a href="http://www.slac.stanford.edu/cgi-bin/nph-traceroute.pl">Stanford University's network</a> to the New York Times' website (nytimes.com):</p><p><code>1  rtr-servcore1-serv01-webserv.slac.stanford.edu (134.79.197.130)  0.572 ms
2  rtr-core1-p2p-servcore1.slac.stanford.edu (134.79.252.166)  0.796 ms
3  rtr-border1-p2p-core1.slac.stanford.edu (134.79.252.133)  0.536 ms
4  slac-mr2-p2p-rtr-border1.slac.stanford.edu (192.68.191.245)  25.636 ms
5  sunncr5-ip-a-slacmr2.es.net (134.55.36.21)  3.306 ms
6  eqxsjrt1-te-sunncr5.es.net (134.55.38.146)  1.384 ms
7  xe-0-3-0.cr1.sjc2.us.above.net (64.125.24.1)  2.722 ms
8  xe-0-1-0.mpr1.sea1.us.above.net (64.125.31.17)  20.812 ms
9  209.249.122.125 (209.249.122.125)  21.385 ms</code></p><p>There are three networks in the above traceroute: stanford.edu, es.net, and above.net. The request starts at Stanford. Between lines 4 and 5 it passes from Stanford's network to their peer es.net. Then, between lines 6 and 7, it passes from es.net to above.net, which appears to provide hosting for the New York Times. This means Stanford has a peering relationship with ES.net. ES.net has a peering relationship with Above.net. And Above.net provides connectivity for the New York Times.</p><p>CloudFlare connects to a large number of networks. You can get a sense of some, although not all, of the networks we peer with through a tool like <a href="http://bgp.he.net/AS13335#_peers">Hurricane Electric's BGP looking glass</a>. CloudFlare connects to peers in two ways. First, we connect directly to certain large carriers and other networks to which we send a large amount of traffic. In this case, we connect our router directly to the router at the border of the other network, usually with a piece of fiber optic cable. Second, we connect to what are known as <a href="https://www.cloudflare.com/learning/cdn/glossary/internet-exchange-point-ixp/">Internet Exchanges</a>, IXs for short, where a number of networks meet in a central point.</p><p>Most major cities have an IX. The model for IXs are different in different parts of the world. Europe runs some of the most robust IXs, and CloudFlare connects to several of them including LINX (the London Internet Exchange), AMS-IX (the Amsterdam Internet Exchange), and DE-CIX (the Frankfurt Internet Exchange), among others. The major networks that make up the Internet --Google, Facebook Yahoo, etc. -- connect to these same exchanges to pass traffic between each other efficiently. When the Spamhaus attacker realized he couldn't go after CloudFlare directly, he began targeting our upstream peers and exchanges.</p>
    <div>
      <h3>Headwaters</h3>
      <a href="#headwaters">
        
      </a>
    </div>
    <p>Once the attackers realized they couldn't knock CloudFlare itself offline even with more than 100Gbps of DDoS traffic, they went after our direct peers. In this case, they attacked the providers from whom CloudFlare buys bandwidth. We, primarily, contract with what are known as Tier 2 providers for CloudFlare's paid bandwidth. These companies peer with other providers and also buy bandwidth from so-called Tier 1 providers.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/22hbMTdkpSNRSpAw5DbuRP/fa5af36d7fc2d355cdc543324139876e/peer_pressure.png.scaled500.png" />
            
            </figure><p>There are <a href="http://en.wikipedia.org/wiki/Tier_1_network">approximately a dozen Tier 1 providers</a> on the Internet. The nature of these providers is that they don't buy bandwidth from anyone. Instead, they engage in what is known as settlement-free peering with the other Tier 1 providers. Tier 2 providers interconnect with each other and then buy bandwidth from the Tier 1 providers in order to ensure they can connect to every other point on the Internet. At the core of the Internet, if all else fails, it is these Tier 1 providers that ensure that every network is connected to every other network. If one of them fails, it's a big deal.</p><p>Anycast means that if the attacker attacked the last step in the traceroute then their attack would be spread across CloudFlare's worldwide network, so instead they attacked the second to last step which concentrated the attack on one single point. This wouldn't cause a network-wide outage, but it could potentially cause regional problems.</p><p>We carefully select our bandwidth providers to ensure they have the ability to deal with attacks like this. Our direct peers quickly filtered attack traffic at their edge. This pushed the attack upstream to their direct peers, largely Tier 1 networks. Tier 1 networks don't buy bandwidth from anyone, so the majority of the weight of the attack ended up being carried by them. While we don't have direct visibility into the traffic loads they saw, we have been told by one major Tier 1 provider that they saw more than 300Gbps of attack traffic related to this attack. That would make this attack one of the largest ever reported.</p><p>The challenge with attacks at this scale is they risk overwhelming the systems that link together the Internet itself. The largest routers that you can buy have, at most, 100Gbps ports. It is possible to bond more than one of these ports together to create capacity that is greater than 100Gbps however, at some point, there are limits to how much these routers can handle. If that limit is exceeded then the network becomes congested and slows down.</p><p>Over the last few days, as these attacks have increased, we've seen congestion across several major Tier 1s, primarily in Europe where most of the attacks were concentrated, that would have affected hundreds of millions of people even as they surfed sites unrelated to Spamhaus or CloudFlare. If the Internet felt a bit more sluggish for you over the last few days in Europe, this may be part of the reason why.</p>
    <div>
      <h3>Attacks on the IXs</h3>
      <a href="#attacks-on-the-ixs">
        
      </a>
    </div>
    <p>In addition to CloudFlare's direct peers, we also connect with other networks over the so-called Internet Exchanges (IXs). These IXs are, at their most basic level, switches into which multiple networks connect and can then pass bandwidth. In Europe, these IXs are run as non-profit entities and are considered critical infrastructure. They interconnect hundreds of the world's largest networks including CloudFlare, Google, Facebook, and just about every other major Internet company.</p><p>Beyond attacking CloudFlare's direct peers, the attackers also attacked the core IX infrastructure on the London Internet Exchange (LINX), the Amsterdam Internet Exchange (AMS-IX), the Frankfurt Internet Exchange (DE-CIX), and the Hong Kong Internet Exchange (HKIX). From our perspective, the attacks had the largest effect on LINX which caused impact over the exchange and LINX's systems that monitor the exchange, as visible through the drop in traffic recorded by their monitoring systems. (Corrected: see below for original phrasing.)</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6fHxhpq7UXrJ6WOosNIKdK/d6a45067b94605961eb2871a50f3b661/linx_traffic.png.scaled500.png" />
            
            </figure><p>The congestion impacted many of the networks on the IXs, including CloudFlare's. As problems were detected on the IX, we would route traffic around them. However, several London-based CloudFlare users reported intermittent issues over the last several days. This is the root cause of those problems.</p><p>The attacks also exposed some vulnerabilities in the architecture of some IXs. We, along with many other network security experts, worked with the team at LINX to better secure themselves. In doing so, we developed a list of best practices for any IX in order to make them less vulnerable to attacks.</p><p>Two specific suggestions to limit attacks like this involve making it more difficult to attack the IP addresses that members of the IX use to interchange traffic between each other. We are working with IXs to ensure that: 1) these IP addresses should not be announced as routable across the public Internet; and 2) packets destined to these IP addresses should only be permitted from other IX IP addresses. We've been very impressed with the team at LINX and how quickly they've worked to implement these changes and add additional security to their IX and are hopeful other IXs will quickly follow their lead.</p>
    <div>
      <h3>The Full Impact of the Open Recursor Problem</h3>
      <a href="#the-full-impact-of-the-open-recursor-problem">
        
      </a>
    </div>
    <p>At the bottom of this attack we once again find the problem of open DNS recursors. The attackers were able to generate more than 300Gbps of traffic likely with a network of their own that only had access 1/100th of that amount of traffic themselves. We've written about how these mis-configured DNS recursors as a <a href="/deep-inside-a-dns-amplification-ddos-attack">bomb waiting to go off</a> that literally threatens the stability of the Internet itself. We've now seen an attack that begins to illustrate the full extent of the problem.</p><p>While lists of open recursors have been passed around on network security lists for the last few years, on Monday the full extent of the problem was, for the first time, made public. The <a href="http://openresolverproject.org">Open Resolver Project</a> made available the full list of the 21.7 million open resolvers online in an effort to shut them down.</p><p>We'd debated doing the same thing ourselves for some time but worried about the collateral damage of what would happen if such a list fell into the hands of the bad people. The last five days have made clear that the bad people have the list of open resolvers and they are getting increasingly brazen in the attacks they are willing to launch. We are in full support of the Open Resolver Project and believe it is incumbent on all network providers to work with their customers to close any open resolvers running on their networks.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7LNpfbEWSkEYDsC7wGJL6D/31e15acde4efadce0767f14b13a3945f/bazookas.jpg.scaled500.jpg" />
            
            </figure><p>Unlike traditional <a href="https://www.cloudflare.com/learning/ddos/what-is-a-ddos-botnet/">botnets</a> which could only generate limited traffic because of the modest Internet connections and home PCs they typically run on, these open resolvers are typically running on big servers with fat pipes. They are like bazookas and the events of the last week have shown the damage they can cause. What's troubling is that, compared with what is possible, this attack may prove to be relatively modest.</p><p>As someone in charge of <a href="https://www.cloudflare.com/learning/ddos/ddos-mitigation/">DDoS mitigation</a> at one of the Internet giants emailed me this weekend: "I've often said we don't have to prepare for the largest-possible attack, we just have to prepare for the largest attack the Internet can send without causing massive collateral damage to others. It looks like you've reached that point, so...congratulations!"</p><p>At CloudFlare one of our goals is to make DDoS something you only read about in the history books. We're proud of how our network held up under such a massive attack and are working with our peers and partners to ensure that the Internet overall can stand up to the threats it faces.</p><p><b><i>Correction</i></b><i>: The original sentence about the impact on LINX was "From our perspective, the attacks had the largest effect on LINX which for a little over an hour on March 23 saw the infrastructure serving more than half of the usual 1.5Tbps of peak traffic fail." That was not well phrased, and has been edited, with notation in place.</i></p> ]]></content:encoded>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[Attacks]]></category>
            <category><![CDATA[Mitigation]]></category>
            <guid isPermaLink="false">4LZr62jdUEvyAvcv3e8doO</guid>
            <dc:creator>Matthew Prince</dc:creator>
        </item>
        <item>
            <title><![CDATA[The DDoS That Knocked Spamhaus Offline (And How We Mitigated It)]]></title>
            <link>https://blog.cloudflare.com/the-ddos-that-knocked-spamhaus-offline-and-ho/</link>
            <pubDate>Wed, 20 Mar 2013 18:26:00 GMT</pubDate>
            <description><![CDATA[ At CloudFlare, we deal with large DDoS attacks every day. Usually, these attacks are directed at large companies or organizations that are reluctant to talk about their details. Sometimes a customer is willing to let us tell their story. ]]></description>
            <content:encoded><![CDATA[ <p>At CloudFlare, we deal with large DDoS attacks every day. Usually, these attacks are directed at large companies or organizations that are reluctant to talk about their details. It's fun, therefore, whenever we have a customer that is willing to let us tell the story of an attack they saw and how we mitigated it. This is one of those stories.</p>
    <div>
      <h3>Spamhaus</h3>
      <a href="#spamhaus">
        
      </a>
    </div>
    <p>Yesterday, Tuesday, March 19, 2013, CloudFlare was contacted by the non-profit anti-spam organization <a href="http://www.spamhaus.org/">Spamhaus</a>. They were suffering a large DDoS attack against their website and asked if we could help mitigate the attack.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4Wecs529sGWHAYasTfE7Gq/f9dba38714a753f2adcf293204bb512f/spamhaus_logo.jpg.scaled500.jpg" />
            
            </figure><p>Spamhaus provides one of the key backbones that underpins much of the anti-spam filtering online. Run by a tireless team of volunteers, Spamhaus patrols the Internet for spammers and publishes a list of the servers they use to send their messages in order to empower email system administrators to filter unwanted messages. Spamhaus's services are so pervasive and important to the operation of the Internet's email architecture that, when a <a href="http://www.theregister.co.uk/2011/09/05/spamhaus_e360_insight_lawsuit/">lawsuit threatened to shut the service down</a>, industry experts testified [<a href="http://app.quickblogcast.com/files/31236-29497/spamhaus_amicus.pdf">PDF</a>], full disclosure: I wrote the brief back in the day] that doing so risked literally breaking email since Spamhaus is directly or indirectly responsible for filtering as much as 80% of daily spam messages.</p><p>Beginning on March 18, the Spamhaus site <a href="https://isc.sans.edu/diary/Spamhaus+DDOS/15427">came under attack</a>. The attack was large enough that the Spamhaus team wasn't sure of its size when they contacted us. It was sufficiently large to fully saturate their connection to the rest of the Internet and knock their site offline. These very large attacks, which are known as Layer 3 attacks, are difficult to stop with any on-premise solution. Put simply: if you have a router with a 10Gbps port, and someone sends you 11Gbps of traffic, it doesn't matter what intelligent software you have to stop the attack because your network link is completely saturated.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/51oSfCfTv2ERCl2eSuoPfk/4ab3a890dc469279d18266e469d4ec6d/burst_pipe.jpg.scaled500.jpg" />
            
            </figure><p>While we don't know who was behind this attack, Spamhaus has made plenty of enemies over the years. Spammers aren't always the most lovable of individuals and Spamhaus has been threatened, sued, and DDoSed regularly. Spamhaus's blocklists are distributed via DNS and there is a long list of volunteer organizations that mirror their <a href="https://www.cloudflare.com/learning/dns/what-is-dns/">DNS infrastructure</a> in order to ensure it is resilient to attacks. The website, however, was unreachable.</p>
    <div>
      <h4>Filling Up the Series of Tubes</h4>
      <a href="#filling-up-the-series-of-tubes">
        
      </a>
    </div>
    <p>Very large Layer 3 attacks are nearly always originated from a number of sources. These many sources each send traffic to a single Internet location, effectively creating a tidal wave that overwhelms the target's resources. In this sense, the attack is distributed (the first D in DDoS -- Distributed Denial of Service). The sources of attack traffic can be a group of individuals working together (e.g., the Anonymous LOIC model, although this is Layer 7 traffic and even at high volumes usually much smaller in volume than other methods), a botnet of compromised PCs, a botnet of compromised servers, <a href="/deep-inside-a-dns-amplification-ddos-attack">misconfigured DNS resolvers</a>, or even <a href="http://internetcensus2012.bitbucket.org/paper.html">home Internet routers with weak passwords</a>.</p><p>Since an attacker attempting to launch a Layer 3 attack doesn't care about receiving a response to the requests they send, the packets that make up the attack do not have to be accurate or correctly formatted. Attackers will regularly spoof all the information in the attack packets, including the source IP, making it look like the attack is coming from a virtually infinite number of sources. Since packets data can be fully randomized, using techniques like IP filtering even upstream becomes virtually useless.</p><p>Spamhaus signed up for CloudFlare on Tuesday afternoon and we immediately mitigated the attack, making the site once again reachable. (More on how we did that below.) Once on our network, we also began recording data about the attack. At first, the attack was relatively modest (around 10Gbps). There was a brief spike around 16:30 UTC, likely a test, that lasted approximately 10 minutes. Then, around 21:30 UTC, the attackers let loose a very large wave.</p><p>The graph below is generated from bandwidth samples across a number of the routers that sit in front of servers we use for DDoS scrubbing. The green area represents in-bound requests and the blue line represents out-bound responses. While there is always some attack traffic on our network, it's easy to see when the attack against Spamhaus started and then began to taper off around 02:30 UTC on March 20, 2013. As I'm writing this at 16:15 UTC on March 20, 2013, it appears the attack is picking up again.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7kvAKu7knFauQdp1nGcImO/4e8da47a3589b07052b4377f0f7d47a4/spamhaus_ddos_attack.png.scaled500.png" />
            
            </figure>
    <div>
      <h3>How to Generate a 75Gbps DDoS</h3>
      <a href="#how-to-generate-a-75gbps-ddos">
        
      </a>
    </div>
    <p>The largest source of attack traffic against Spamhaus came from DNS reflection. I've <a href="/deep-inside-a-dns-amplification-ddos-attack">written about these attacks before</a> and in the last year they have become the source of the largest Layer 3 DDoS attacks we see (sometimes well exceeding 100Gbps). Open DNS resolvers are quickly becoming the scourge of the Internet and the size of these attacks will only continue to rise until all providers make a <a href="/good-news-open-dns-resolvers-are-getting-clos">concerted effort to close them</a>. (It also makes sense to implement <a href="http://tools.ietf.org/html/bcp38">BCP-38</a>, but that's a topic for another post another time.)</p><p>The basic technique of a DNS reflection attack is to send a request for a large DNS zone file with the source IP address spoofed to be the intended victim to a large number of open DNS resolvers. The resolvers then respond to the request, sending the large DNS zone answer to the intended victim. The attackers' requests themselves are only a fraction of the size of the responses, meaning the attacker can effectively amplify their attack to many times the size of the bandwidth resources they themselves control.</p><p>In the Spamhaus case, the attacker was sending requests for the DNS zone file for ripe.net to open DNS resolvers. The attacker spoofed the CloudFlare IPs we'd issued for Spamhaus as the source in their DNS requests. The open resolvers responded with DNS zone file, generating collectively approximately 75Gbps of attack traffic. The requests were likely approximately 36 bytes long (e.g. dig ANY ripe.net @X.X.X.X+edns=0 +bufsize=4096, where X.X.X.X is replaced with the IP address of an open DNS resolver) and the response was approximately 3,000 bytes, translating to a 100x amplification factor.</p><p>We recorded over 30,000 unique DNS resolvers involved in the attack. This translates to each open DNS resolver sending an average of 2.5Mbps, which is small enough to fly under the radar of most DNS resolvers. Because the attacker used a DNS amplification, the attacker only needed to control a botnet or cluster of servers to generate 750Mbps -- which is possible with a small sized botnet or a handful of AWS instances. It is worth repeating: open DNS resolvers are the scourge of the Internet and these attacks will become more common and large until service providers take serious efforts to close them.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5R2fmKkhGJ7YBYFQaiwk0U/89e5ea8a65d518260911e64e91315e45/im_under_attack.jpg.scaled500.jpg" />
            
            </figure>
    <div>
      <h3>How You Mitigate a 75Gbps DDoS</h3>
      <a href="#how-you-mitigate-a-75gbps-ddos">
        
      </a>
    </div>
    <p>While large Layer 3 attacks are difficult for an on-premise DDoS solution to mitigate, CloudFlare's network was specifically designed from the beginning to stop these types of attacks. We make heavy use of Anycast. That means the same IP address is announced from every one of our 23 worldwide data centers. The network itself <a href="/cloudflares-architecture-eliminating-single-p">load balances requests</a> to the nearest facility. Under normal circumstances, this helps us ensure a visitor is routed to the nearest data center on our network.</p><p>When there's an attack, Anycast serves to effectively dilute it by spreading it across our facilities. Since every data center announces the same IP address for any CloudFlare customer, traffic cannot be concentrated in any one location. Instead of the attack being many-to-one, it becomes many-to-many with no single point on the network acting as a bottleneck.</p><p>Once diluted, the attack becomes relatively easy to stop at each of our data centers. Because CloudFlare acts as a virtual shield in front of our customers sites, with Layer 3 attacks none of the attack traffic reaches the customer's servers. Traffic to Spamhaus's network dropped to below the levels when the attack started as soon as they signed up for our service.</p>
    <div>
      <h3>Other Noise</h3>
      <a href="#other-noise">
        
      </a>
    </div>
    <p>While the majority of the traffic involved in the attack was DNS reflection, the attacker threw in a few other attack methods as well. One was a so-called ACK reflection attack. When a TCP connection is established there is a handshake. The server initiating the TCP session first sends a SYN (for synchronize) request to the receiving server. The receiving server responds with an ACK (for acknowledge). After that handshake, data can be exchanged.</p><p>In an ACK reflection, the attacker sends a number of SYN packets to servers with a spoofed source IP address pointing to the intended victim. The servers then respond to the victim's IP with an ACK. Like the DNS reflection attack, this disguises the source of the attack, making it appear to come from legitimate servers. However, unlike the DNS reflection attack, there is no amplification factor: the bandwidth from the ACKs is symmetrical to the bandwidth the attacker has to generate the SYNs. CloudFlare is configured to drop unmatched ACKs, which mitigates these types of attacks.</p><p>Whenever we see one of these large attacks, network operators will write to us upset that we are attacking their infrastructure with abusive DNS queries or SYN floods. In fact, it is their infrastructure that is being used to reflect an attack at us. By working with and educating network operators, they clean up their network which helps to solve the root cause of these large attacks.</p>
    <div>
      <h3>History Repeats Itself</h3>
      <a href="#history-repeats-itself">
        
      </a>
    </div>
    <p>Finally, it's worth noting how similar this battle against DDoS attacks and open DNS relays is with Spamhaus's original fight. If DDoS is the network scourge of tomorrow, spam was its clear predecessor. Paul Vixie, <a href="http://en.wikipedia.org/wiki/DNSBL">the father of the DNSBL</a>, set out in 1997 to use DNS to help shut down the spam source of the day: open email relays. These relays were being used to disguise the origin of spam messages, making them more difficult to block. What was needed was a list of mail relays that mail serves could query against and decide whether to accept messages.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3AGO4753NTPKp5Z45WlMhk/226168b5efcdda27b3568adcc9d80d86/history_repeats_itself.png.scaled500.png" />
            
            </figure><p>While it wasn't originally designed with the idea in mind, DNS proved a highly scalable and efficient means to distribute a queryable list of open mail relays that email service providers could use to block unwanted messages. Spamhaus arose as one of the most respected and widely used DNSBLs, effectively blocking a huge percentage of daily spam volume.</p><p>As open mail relays were shut, spammers turned to virus writers to create botnets that could be used to relay spam. Spamhaus expanded their operations to list the IPs of known botnets, trying to stay ahead of spammers. CloudFlare's own history grew out of <a href="http://www.projecthoneypot.org/">Project Honey Pot</a>, which started as an automated service to track the resources used by spammers and publishes the HTTP:BL.</p><p>Today, as Spamhaus's success has eroded the business model of spammers, botnet operators are increasingly renting their networks to launch DDoS attacks. At the same time, DNSBLs proved that there were many functions that the DNS protocol could be used for, encouraging many people to tinker with installing their own DNS resolvers. Unfortunately, these DNS resolvers are often mis-configured and left open to abuse, making them the DDoS equivalent of the open mail relay.</p><p>If you're running a network, take a second to make sure you've closed any open resolvers before DDoS explodes into an even worse problem than it already is.</p> ]]></content:encoded>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[Attacks]]></category>
            <category><![CDATA[Mitigation]]></category>
            <guid isPermaLink="false">4HSY3RI006GQzjTzxBvan9</guid>
            <dc:creator>Matthew Prince</dc:creator>
        </item>
    </channel>
</rss>