
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Sat, 04 Apr 2026 16:02:26 GMT</lastBuildDate>
        <item>
            <title><![CDATA[xdpcap: XDP Packet Capture]]></title>
            <link>https://blog.cloudflare.com/xdpcap/</link>
            <pubDate>Wed, 24 Apr 2019 18:21:59 GMT</pubDate>
            <description><![CDATA[ Our servers manage heaps of network packets, from legit traffic to big DDoS attacks. For top efficiency, we adopted eXpress Data Path (XDP), a Linux kernel tool for swift, low-level packet handling. ]]></description>
            <content:encoded><![CDATA[ <p>Our servers process a lot of network packets, be it legitimate traffic or <a href="/say-cheese-a-snapshot-of-the-massive-ddos-attacks-coming-from-iot-cameras/">large</a> <a href="/how-the-consumer-product-safety-commission-is-inadvertently-behind-the-internets-largest-ddos-attacks/">denial of service</a> <a href="/reflections-on-reflections/">attacks</a>. To do so efficiently, we’ve embraced <a href="http://docs.cilium.io/en/latest/bpf/">eXpress Data Path (XDP)</a>, a Linux kernel technology that provides a high performance mechanism for low level packet processing. We’re using it to <a href="/l4drop-xdp-ebpf-based-ddos-mitigations/">drop DoS attack packets with L4Drop</a>, and also in our new layer 4 load balancer. But there’s a downside to XDP: because it processes packets before the normal Linux network stack sees them, packets redirected or dropped are invisible to regular debugging tools such as <a href="https://www.tcpdump.org/">tcpdump</a>.</p><p>To address this, we built a tcpdump replacement for XDP, xdpcap. We are open sourcing this tool: the <a href="https://github.com/cloudflare/xdpcap">code and documentation are available on GitHub</a>.</p><p>xdpcap uses our classic BPF (cBPF) to eBPF or C compiler, cbpfc, which we are also open sourcing: the <a href="https://github.com/cloudflare/cbpfc">code and documentation are available on GitHub</a>.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1X6o2OMKm5ePt10Xm3Vgpr/deb5e63073ba5b69d08be1e41ba4344b/White_tailed_eagle_raftsund_square_crop.jpg" />
            
            </figure><p><a href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a> <a href="https://commons.wikimedia.org/wiki/File:White_tailed_eagle_raftsund_square_crop.jpg">image</a> by <a href="http://www.christophmueller.org">Christoph Müller</a></p><p>Tcpdump provides an easy way to dump specific packets of interest. For example, to capture all IPv4 DNS packets, one could:</p>
            <pre><code>$ tcpdump ip and udp port 53</code></pre>
            <p>xdpcap reuses the same syntax! xdpcap can write packets to a pcap file:</p>
            <pre><code>$ xdpcap /path/to/hook capture.pcap "ip and udp port 53"
XDPAborted: 0/0   XDPDrop: 0/0   XDPPass: 254/0   XDPTx: 0/0   (received/matched packets)
XDPAborted: 0/0   XDPDrop: 0/0   XDPPass: 995/1   XDPTx: 0/0   (received/matched packets)</code></pre>
            <p>Or write the pcap to stdout, and decode the packets with tcpdump:</p>
            <pre><code>$ xdpcap /path/to/hook - "ip and udp port 53" | sudo tcpdump -r -
reading from file -, link-type EN10MB (Ethernet)
16:18:37.911670 IP 1.1.1.1 &gt; 1.2.3.4.21563: 26445$ 1/0/1 A 93.184.216.34 (56)</code></pre>
            <p>The remainder of this post explains how we built xdpcap, including how <code>/path/to/hook/</code> is used to attach to XDP programs.</p>
    <div>
      <h2>tcpdump</h2>
      <a href="#tcpdump">
        
      </a>
    </div>
    <p>To replicate tcpdump, we first need to understand its inner workings. <a href="/bpf-the-forgotten-bytecode/">Marek Majkowski has previously written a detailed post on the subject</a>. Tcpdump exposes a high level filter language, <a href="https://www.tcpdump.org/manpages/pcap-filter.7.html">pcap-filter</a>, to specify which packets are of interest. Reusing our earlier example, the following filter expression captures all IPv4 UDP packets to or from port 53, likely DNS traffic:</p>
            <pre><code>ip and udp port 53</code></pre>
            <p>Internally, tcpdump uses libpcap to compile the filter to classic BPF (cBPF). cBPF is a simple bytecode language to represent programs that inspect the contents of a packet. A program returns non-zero to indicate that a packet matched the filter, and zero otherwise. The virtual machine that executes cBPF programs is very simple, featuring only two registers, <code>a</code> and <code>x</code>. There is no way of checking the length of the input packet<a href="/xdpcap/#fn1"><sup>[1]</sup></a>; instead any out of bounds packet access will terminate the cBPF program, returning 0 (no match). The full set of opcodes are listed in <a href="https://www.kernel.org/doc/Documentation/networking/filter.txt">the Linux documentation</a>. Returning to our example filter, <code>ip and udp port 53</code> compiles to the following cBPF program, expressed as an annotated flowchart:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7vcV3QJKDR7VK5bbrI8UKh/738d2cb9fe464ec27cc16dd9f4e08926/Screen-Shot-2019-08-27-at-10.43.41-AM.png" />
            
            </figure><p>Example cBPF filter flowchart</p><p>Tcpdump attaches the generated cBPF filter to a raw packet socket using a <code>setsockopt</code> system call with <code>SO_ATTACH_FILTER</code>. The kernel runs the filter on every packet destined for the socket, but only delivers matching packets. Tcpdump displays the delivered packets, or writes them to a pcap capture file for later analysis.</p>
    <div>
      <h2>xdpcap</h2>
      <a href="#xdpcap">
        
      </a>
    </div>
    <p>In the context of XDP, our tcpdump replacement should:</p><ul><li><p>Accept filters in the same filter language as tcpdump</p></li><li><p>Dynamically instrument XDP programs of interest</p></li><li><p>Expose matching packets to userspace</p></li></ul>
    <div>
      <h3>XDP</h3>
      <a href="#xdp">
        
      </a>
    </div>
    <p>XDP uses an extended version of the cBPF instruction set, eBPF, to allow arbitrary programs to run for each packet received by a network card, potentially modifying the packets. A stringent kernel verifier statically analyzes eBPF programs, ensuring that memory bounds are checked for every packet load.</p><p>eBPF programs can return:</p><ul><li><p><code>XDP_DROP</code>: Drop the packet</p></li><li><p><code>XDP_TX</code>: Transmit the packet back out the network interface</p></li><li><p><code>XDP_PASS</code>: Pass the packet up the network stack</p></li></ul><p>eBPF introduces several new features, notably helper function calls, enabling programs to call functions exposed by the kernel. This includes retrieving or setting values in maps, key-value data structures that can also be accessed from userspace.</p>
    <div>
      <h3>Filter</h3>
      <a href="#filter">
        
      </a>
    </div>
    <p>A key feature of tcpdump is the ability to efficiently pick out packets of interest; packets are filtered before reaching userspace. To achieve this in XDP, the desired filter must be converted to eBPF.</p><p>cBPF is already used in our <a href="/l4drop-xdp-ebpf-based-ddos-mitigations/#bpf-support">XDP based DoS mitigation pipeline</a>: cBPF filters are first converted to C by cbpfc, and the result compiled with Clang to eBPF. Reusing this mechanism allows us to fully support libpcap filter expressions:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2cWmovp2qWyqPw96muVV6Y/4bfb47288ad02579c840042950979158/2-1.png" />
            
            </figure><p>Pipeline to convert pcap-filter expressions to eBPF via C using cbpfc</p><p>To remove the Clang runtime dependency, our cBPF compiler, cbpfc, was extended to directly generate eBPF:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6v22s8MPrGXaJIZwTdq6zk/0bc7df11722b408738234116542a7b3e/3-1.png" />
            
            </figure><p>Pipeline to convert pcap-filter expressions directly to eBPF using cbpfc</p><p>Converted to eBPF using cbpfc, <code>ip and udp port 53</code> yields:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3bMYRG5ajPayvdVoTelezY/6f3d6ed62167bd62ff0a4af51cca81ce/Screen-Shot-2019-08-27-at-10.44.20-AM.png" />
            
            </figure><p>Example cBPF filter converted to eBPF with cbpfc flowchart</p><p>The emitted eBPF requires a prologue, which is responsible for loading a pointer to the beginning, and end, of the input packet into registers <code>r6</code>and <code>r7</code> respectively<a href="/xdpcap/#fn2"><sup>[2]</sup></a>.</p><p>The generated code follows a very similar structure to the original cBPF filter, but with:</p><ul><li><p>Bswap instructions to convert big endian packet data to little endian.</p></li><li><p>Guards to check the length of the packet before we load data from it. These are required by the kernel verifier.</p></li></ul><p>The epilogue can use the result of the filter to perform different actions on the input packet.</p><p>As mentioned earlier, we’re open sourcing cbpfc; <a href="https://github.com/cloudflare/cbpfc">the code and documentation are available on GitHub</a>. It can be used to compile cBPF to C, or directly to eBPF, and the generated code is accepted by the kernel verifier.</p>
    <div>
      <h3>Instrument</h3>
      <a href="#instrument">
        
      </a>
    </div>
    <p>Tcpdump can start and stop capturing packets at any time, without requiring coordination from applications. This rules out modifying existing XDP programs to directly run the generated eBPF filter; the programs would have to be modified each time xdpcap is run. Instead, programs should expose a hook that can be used by xdpcap to attach filters at runtime.</p><p>xdpcap’s hook support is built around eBPF tail-calls. XDP programs can yield control to other programs using the tail-call helper. Control is never handed back to the calling program, the return code of the subsequent program is used. For example, consider two XDP programs, foo and bar, with foo attached to the network interface. Foo can tail-call into bar:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4MaJdTMGs41UECBvq7FqhA/e1919c95963cf44419fefd2a14044c87/5.png" />
            
            </figure><p>Flow of XDP program foo tail-calling into program bar</p><p>The program to tail-call into is configured at runtime, using a special eBPF program array map. eBPF programs tail-call into a specific index of the map, the value of which is set by userspace. From our example above, foo’s tail-call map holds a single entry:</p><table><tr><td><p><b>index</b></p></td><td><p><b>program</b></p></td></tr><tr><td><p>0</p></td><td><p>bar</p></td></tr></table><p>A tail-call into an empty index will not do anything, XDP programs always need to return an action themselves after a tail-call should it fail. Once again, this is enforced by the kernel verifier. In the case of program foo:</p>
            <pre><code>int foo(struct xdp_md *ctx) {
    // tail-call into index 0 - program bar
    tail_call(ctx, &amp;map, 0);

    // tail-call failed, pass the packet
    return XDP_PASS;
}</code></pre>
            <p>To leverage this as a hook point, the instrumented programs are modified to always tail-call, using a map that is exposed to xdpcap by <a href="https://facebookmicrosites.github.io/bpf/blog/2018/08/31/object-lifetime.html#bpffs">pinning it to a bpffs</a>. To attach a filter, xdpcap can set it in the map. If no filter is attached, the instrumented program returns the correct action itself.</p><p>With a filter attached to program foo, we have:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/S8T2hX3KZvC28JFvsQFB7/ffdc88d58cc5cfa5e1027cf1045274b6/6.png" />
            
            </figure><p>Flow of XDP program foo tail-calling into an xdpcap filter</p><p>The filter must return the original action taken by the instrumented program to ensure the packet is processed correctly. To achieve this, xdpcap generates one filter program per possible XDP action, each one hard-coded to return that specific action. All the programs are set in the map:</p><table><tr><td><p><b>index</b></p></td><td><p><b>program</b></p></td></tr><tr><td><p>0 (<code>XDP_ABORTED</code>)</p></td><td><p>filter <code>XDP_ABORTED</code></p></td></tr><tr><td><p>1 (<code>XDP_DROP</code>)</p></td><td><p>filter <code>XDP_DROP</code></p></td></tr><tr><td><p>2 (<code>XDP_PASS</code>)</p></td><td><p>filter <code>XDP_PASS</code></p></td></tr><tr><td><p>3 (<code>XDP_TX</code>)</p></td><td><p>filter <code>XDP_TX</code></p></td></tr></table><p>By tail-calling into the correct index, the instrumented program determines the final action:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3LFc3RSQQ10cP2N4GrAvwY/cd0f7bba4755f21dd1483b1307ffcf0b/7.png" />
            
            </figure><p>Flow of XDP program foo tail-calling into xdpcap filters, one for each action</p><p>xdpcap provides a helper function that attempts a tail-call for the given action. Should it fail, the action is returned instead:</p>
            <pre><code>enum xdp_action xdpcap_exit(struct xdp_md *ctx, enum xdp_action action) {
    // tail-call into the filter using the action as an index
    tail_call((void *)ctx, &amp;xdpcap_hook, action);

    // tail-call failed, return the action
    return action;
}</code></pre>
            <p>This allows an XDP program to simply:</p>
            <pre><code>int foo(struct xdp_md *ctx) {
    return xdpcap_exit(ctx, XDP_PASS);
}</code></pre>
            
    <div>
      <h3>Expose</h3>
      <a href="#expose">
        
      </a>
    </div>
    <p>Matching packets, as well as the original action taken for them, need to be exposed to userspace. Once again, such a mechanism is already part of our <a href="/l4drop-xdp-ebpf-based-ddos-mitigations/#packet-sampling">XDP based DoS mitigation pipeline</a>!</p><p>Another eBPF helper, <code>perf_event_output</code>, allows an XDP program to generate a perf event containing, amongst some metadata, the packet. As xdpcap generates one filter per XDP action, the filter program can include the action taken in the metadata. A userspace program can create a perf event ring buffer to receive events into, obtaining both the action and the packet.</p><ol><li><p>This is true of the original cBPF, but Linux implements a number of extensions, one of which allows the length of the input packet to be retrieved. <a href="/xdpcap/#fnref1">↩︎</a></p></li><li><p>This example uses registers <code>r6</code> and <code>r7</code>, but cbpfc can be configured to use any registers. <a href="/xdpcap/#fnref2">↩︎</a></p></li></ol><p></p> ]]></content:encoded>
            <category><![CDATA[Linux]]></category>
            <category><![CDATA[Developers]]></category>
            <category><![CDATA[Programming]]></category>
            <guid isPermaLink="false">2uU84JuP2Ez1MTF03l8xwY</guid>
            <dc:creator>Arthur Fabre</dc:creator>
        </item>
        <item>
            <title><![CDATA[L4Drop: XDP DDoS Mitigations]]></title>
            <link>https://blog.cloudflare.com/l4drop-xdp-ebpf-based-ddos-mitigations/</link>
            <pubDate>Wed, 28 Nov 2018 19:59:25 GMT</pubDate>
            <description><![CDATA[ Efficient packet dropping is a key part of Cloudflare’s distributed denial of service (DDoS) attack mitigations. In this post, we introduce a new tool in our packet dropping arsenal: L4Drop. ]]></description>
            <content:encoded><![CDATA[ <p>Efficient packet dropping is a key part of Cloudflare’s distributed denial of service (DDoS) attack mitigations. In this post, we introduce a new tool in our packet dropping arsenal: L4Drop.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1Md3ZoOlBITehVhbWnXZy0/01ff382d6c8a537f5f22a4617840a878/cover.jpg" />
            
            </figure><p><a href="https://www.usa.gov/government-works">Public domain</a> <a href="https://www.flickr.com/photos/usairforce/4293474325/in/photostream/">image</a> by US Air Force</p><p>We've written about our DDoS mitigation pipeline extensively in the past, covering:</p><ul><li><p><a href="/meet-gatebot-a-bot-that-allows-us-to-sleep/">Gatebot</a>: analyzes traffic hitting our edge and deploys DDoS mitigations matching suspect traffic.</p></li><li><p><a href="/introducing-the-bpf-tools/">bpftools</a>: generates Berkeley Packet Filter (BPF) bytecode that matches packets based on DNS queries, <a href="/introducing-the-p0f-bpf-compiler/">p0F signatures</a>, or tcpdump filters.</p></li><li><p>Iptables: matches traffic against the BPF generated by bpftools using the <code>xt_bpf</code> module, and drops it.</p></li><li><p><a href="/kernel-bypass/">Floodgate</a>: offloads work from iptables during big attacks that could otherwise overwhelm the kernel networking stack. Incoming traffic bypasses the kernel to go directly to a BPF interpreter in userspace, which efficiently drops packets matching the BPF rules produced by bpftools.</p></li></ul><p>Both iptables and Floodgate send samples of received traffic to Gatebot for analysis, and filter incoming packets using rules generated by bpftools. This ends up looking something like this:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/EhPnzNXl1KKxTnSIBx0ON/2e502b854d229bd811ece87f885c5656/floodgate.svg" />
            
            </figure><p>Floodgate based DDoS mitigation pipeline</p><p>This pipeline has served us well, but a lot has changed since we implemented Floodgate. Our new <a href="/a-tour-inside-cloudflares-g9-servers/">Gen9</a> and <a href="/arm-takes-wing/">ARM</a> servers use different network interface cards (NIC) than our earlier servers. These new NICs aren’t compatible with Floodgate as it relies on a proprietary Solarflare technology to redirect traffic directly to userspace. Floodgate’s time was finally up.</p>
    <div>
      <h3>XDP to the rescue</h3>
      <a href="#xdp-to-the-rescue">
        
      </a>
    </div>
    <p>A new alternative to the kernel bypass approach has been added to Linux: <a href="http://docs.cilium.io/en/latest/bpf/">eXpress Data Path (XDP)</a>. XDP uses an extended version of the classic BPF instruction set, eBPF, to allow arbitrary code to run for each packet received by a network card driver. As <a href="/how-to-drop-10-million-packets/">Marek demonstrated</a>, this enables high speed packet dropping! eBPF introduces a slew of new features, including:</p><ul><li><p>Maps, key-value data structures shared between the eBPF programs and userspace.</p></li><li><p>A Clang eBPF backend, allowing a sizeable subset of C to be compiled to eBPF.</p></li><li><p>A stringent kernel verifier that statically analyzes eBPF programs, ensuring run time performance and safety.</p></li></ul><p>Compared to our partial kernel bypass, XDP does not require busy polling for packets. This enables us to leave an XDP based solution “always on” instead of enabling it only when attack traffic exceeds a set threshold. XDP programs can also run on multiple CPUs, potentially allowing a higher number of packets to be processed than Floodgate, which was pinned to a single CPU to limit the impact of busy polling.</p><p>Updating our pipeline diagram with XDP yields:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4iszhBhHuL6NX7V4PhUznK/304a8e726b5c3fd63b31c06a65db5b29/xdp.svg" />
            
            </figure><p>XDP based DDoS mitigation pipeline</p>
    <div>
      <h3>Introducing L4Drop</h3>
      <a href="#introducing-l4drop">
        
      </a>
    </div>
    <p>All that remains is to convert our existing rules to eBPF! At first glance, it seems we should be able to store our rules in an eBPF map and have a single program that checks incoming packets against them. <a href="http://vger.kernel.org/lpc-networking.html#session-15__">Facebook’s firewall</a> implements this strategy. This allows rules to be easily inserted or removed from userspace.</p><p>However, the filters created by bpftools rely heavily on matching arbitrary packet data, and performing arbitrary comparisons across headers. For example, a single p0f signature can check both IP &amp; TCP options. On top of this, the thorough static analysis performed by the kernel’s eBPF verifier currently disallows loops. This restriction helps ensure that a given eBPF program will always terminate in a set number of instructions. Coupled together, the arbitrary matching and lack of loops prevent us from storing our rules in maps.</p><p>Instead, we wrote a tool to compile the rules generated by Gatebot and bpftools to eBPF. This allows the generated eBPF to match against any packet data it needs, at the cost of:</p><ul><li><p>Having to recompile the program to add or remove rules</p></li><li><p>Possibly hitting eBPF code complexity limits enforced by the kernel with many rules</p></li></ul><p>A C program is generated from the rules built by Gatebot, and compiled to eBPF using Clang. All that’s left is to reimplement the iptables features we use.</p>
    <div>
      <h4>BPF support</h4>
      <a href="#bpf-support">
        
      </a>
    </div>
    <p>We have many different tools for generating BPF filters, and we need to be able to include these filters in the eBPF generated by L4Drop. While the name eBPF might suggest a minor extension to BPF, the instruction sets are not compatible. In fact, BPF instructions don't even have a one-to-one mapping to eBPF! This can be seen in the <a href="https://elixir.bootlin.com/linux/v4.19.3/source/net/core/filter.c#L752">kernel's internal BPF to eBPF converter</a>, where a single BPF IP header length instruction maps to 6 eBPF instructions.</p><p>To simplify the conversion, we implemented a BPF to C compiler. This allows us to include any BPF program in the aforementioned C program generated by L4Drop. For example, if we generate a BPF program matching a DNS query to any subdomain of example.com using bpftools, we get:</p>
            <pre><code>$ ./bpfgen dns -- "*.example.com"
18,177 0 0 0,0 0 0 20,12 0 0 0,...</code></pre>
            <p>Converted to C, we end up with:</p>
            <pre><code>bool cbpf_0_0(uint8_t *data, uint8_t *data_end) {
    __attribute__((unused))
    uint32_t a, x, m[16];

    if (data + 1 &gt; data_end) return false;
    x = 4*(*(data + 0) &amp; 0xf);

    ...
}</code></pre>
            <p>The BPF instructions each expand to a single C statement, and the BPF registers (<code>a</code>, <code>x</code> and <code>m</code>) are emulated as variables. This has the added benefit of allowing Clang to optimize the full program. The generated C includes the minimum number of guards required to prevent out of bounds packet accesses, as required by the kernel.</p>
    <div>
      <h4>Packet sampling</h4>
      <a href="#packet-sampling">
        
      </a>
    </div>
    <p>Gatebot requires all traffic received by a server to be sampled at a given rate, and sent off for analysis. This includes dropped packets. Consequently, we have to sample before we drop anything. Thankfully, eBPF can call into the kernel using a restricted set of helper functions, and one of these, <code>bpf_xdp_event_output</code>, allows us to copy packets to a perf event ring buffer. A userspace daemon then reads from the perf buffer, obtaining the packets. Coupled with another helper, <code>bpf_get_prandom_u32()</code>, to generate random numbers, the C code to sample packets ends up something like:</p>
            <pre><code>// Threshold may be &gt; UINT32_MAX
uint64_t rnd = (uint64_t)get_prandom_u32();

if (rnd &lt; threshold) {
    // perf_event_output passes the number of bytes as a flags in the
    // high 32 bits of the flags parameter.
    uint64_t flags = len &lt;&lt; 32;

    // Use the current CPU number as index to sampled_packets.
    flags |= BPF_F_CURRENT_CPU;


    // Write the packet in ctx to the perf buffer
    if (xdp_event_output(ctx, &amp;sampled_packets, flags, &amp;len, sizeof(len))) {
        return XDP_ABORTED;
    }
}</code></pre>
            <p>The <a href="https://github.com/newtools/ebpf">newtools/ebpf</a> library we use to load eBPF programs into the kernel has support for creating the required supporting eBPF map (sampled_packets in this example), and reading from the perf buffer in userspace.</p>
    <div>
      <h4>Geo rules</h4>
      <a href="#geo-rules">
        
      </a>
    </div>
    <p>With our <a href="https://www.cloudflare.com/network/">large anycast network</a>, a truly distributed denial of service attack will impact many of our data centers. But not all attacks have this property. We sometimes require location specific rules.</p><p>To avoid having to build separate eBPF programs for every location, we want the ability to enable or disable rules before loading a program, but after compiling it.</p><p>One approach would be to store whether a rule is enabled or not in an eBPF map. Unfortunately, such map lookups can increase the code size. Due to the kernel’s strict code complexity limits for XDP code, this reduces the number of rules that fit in a single program. Instead, we modify the generated eBPF ELF before loading it into the kernel.</p><p>If, in the original C program, every rule is guarded by a conditional like so:</p>
            <pre><code>int xdp_test(struct xdp_md *ctx) {
    unsigned long long enabled;

    asm("%0 = 0 ll" : "=r"(enabled));

    if (enabled) {
        // Check the packet against the rule
        return XDP_DROP;
    } else {
        return XDP_PASS;
    }
}</code></pre>
            <p><code>asm("%0 = 0 ll" : "=r"(enabled))</code> will emit a single 64bit eBPF load instruction, loading <code>0</code> into the register holding the variable <code>enabled</code>:</p>
            <pre><code>$ llvm-objdump-6.0 -S test.o

test.o: file format ELF64-BPF

Disassembly of section prog:
xdp_test:
       0:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00         r1 = 0 ll
       2:       b7 00 00 00 02 00 00 00         r0 = 2
       3:       15 01 01 00 00 00 00 00         if r1 == 0 goto +1 &lt;LBB0_2&gt;
       4:       b7 00 00 00 01 00 00 00         r0 = 1

LBB0_2:
       5:       95 00 00 00 00 00 00 00         exit</code></pre>
            <p>Modifying the ELF to change the load instruction to load <code>0</code> or <code>1</code> will change the value of <code>enabled</code>, enabling or disabling the rule. The kernel even <a href="https://elixir.bootlin.com/linux/v4.19.3/source/kernel/bpf/verifier.c#L3868">trims conditionals against constants like these</a>.</p><p>Modifying the instructions requires the ability to differentiate these special loads against ones normally emitted by Clang. Changing the asm to load a symbol (<code>asm("%0 = RULE_0_ENABLED ll" : "=r"(enabled))</code>) ensures it shows up in the ELF relocation info with that symbol name:</p>
            <pre><code>$ llvm-readelf-6.0 -r test.o

Relocation section '.relprog' at offset 0xf0 contains 1 entries:
    Offset             Info             Type               Symbol's Value  Symbol's Name
0000000000000000  0000000200000001 R_BPF_64_64            0000000000000000 RULE_0_ENABLED</code></pre>
            <p>This enables the <a href="https://github.com/newtools/ebpf">newtools/ebpf</a> loader to parse the ELF relocation info, and always find the correct load instruction that guards enabling or disabling a rule.</p>
    <div>
      <h3><b>Production</b></h3>
      <a href="#production">
        
      </a>
    </div>
    <p>L4Drop is running in production across all of our servers, and protecting us against DDoS attacks. For example, this server dropped over 8 million packets per second:</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/596ZBycRpdqEzHLKxWI09H/f443840e5a74888b17a329eccc883ba9/combined-rx-narrow.png" />
            
            </figure><p>received, dropped and lost packets per second vs CPU usage on a server during a DDoS attack</p><p>The graph shows a sudden increase in received packets (red). Initially, the attack overwhelmed the kernel network stack, causing some packets to be lost on the network card (yellow). The overall CPU usage (magenta) rose sharply.</p><p>Gatebot detected the attack shortly thereafter, deploying a rule matching the attack traffic. L4Drop began dropping the relevant packets (green), reaching over 8 million dropped packets per second.</p><p>The amount of traffic dropped (green) closely followed the received traffic (red), and the amount of traffic passed through remains unchanged before, during, and after the attack. This highlights the effectiveness of the deployed rule. At one point the attack traffic changed slightly, leading to a gap between the dropped and received traffic until Gatebot could respond with a new rule.</p><p>During the brunt of the attack, the overall CPU usage (magenta) only rose by about 10%, demonstrating the efficiency of XDP.</p><p>The softirq CPU usage (blue) shows the CPU usage under which XDP / L4drop runs, but also includes other network related processing. It increased by slightly over a factor of 2, while the number of incoming packets per second increased by over a factor of 40!</p>
    <div>
      <h3>Conclusion</h3>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>While we’re happy with the performance of L4Drop so far, our pipeline is in a constant state of improvement. We’re working on supporting a greater number of simultaneous rules in L4Drop through multiple, chained, eBPF programs. Another point of focus is increasing the efficiency of our generated programs, and supporting new eBPF features. Reducing our attack detection delay would also allow us to deploy rules quicker, leading to less lost packets at the onset of an attack.</p> ]]></content:encoded>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[Attacks]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Speed & Reliability]]></category>
            <guid isPermaLink="false">40D8ULOdPAsZwViKbE1SlP</guid>
            <dc:creator>Arthur Fabre</dc:creator>
        </item>
    </channel>
</rss>