Cloudflare Repositories FTW

This is a guest post by Jim “Elwood” O’Gorman, one of the maintainers of Kali Linux. Kali Linux is a Debian based GNU/Linux distribution popular amongst the security research communities.

Kali Linux turned six years old this year!

In this time, Kali has established itself as the de-facto standard open source penetration testing platform. On a quarterly basis, we release updated ISOs for multiple platforms, pre-configured virtual machines, Kali Docker, WSL, Azure, AWS images, tons of ARM devices, Kali NetHunter, and on and on and on. This has lead to Kali being trusted and relied on to always being there for both security professionals and enthusiasts alike.

But that popularity has always led to one complication: How to get Kali to people?

With so many different downloads plus the apt repository, we have to move a lot of data. To accomplish this, we have always relied on our network of first- and third-party mirrors.

The way this works is, we run a master server that pushes out to a number of mirrors. We then pay to host a number of servers that are geographically dispersed and use them as our first-party mirrors. Then, a number of third parties donate storage and bandwidth to operate third-party mirrors, ensuring that we have even more systems that are geographically close to you. When you go to download, you hit a redirector that will send you to a mirror that is close to you, ideally allowing you to download your files quickly.

This solution has always been pretty decent, however it has some drawbacks. First, our network of first-party mirrors is expensive. Second, some mirrors are not as good as others. Nothing is worse than trying to download Kali and getting sent to a slow mirror, where your download might drag on for hours. Third, we always always need more mirrors as Kali continues to grow in popularity.

This situation led to us encountering Cloudflare thanks to some extremely generous outreach

https://t.co/k6M5UZxhWF and we can chat more about your specific use case.
— Justin (@xxdesmus) June 29, 2018

I will be honest, we are a bunch of security nerds, so we were a bit skeptical at first. We have some pretty unique needs, we use a lot of bandwidth, syncing an apt repository to a CDN is no small task, and well, we are paranoid. We have an average of 1,000,000 downloads a month on just our ISO images. Add in our apt repos and you are talking some serious, serious traffic. So how much help could we really expect from Cloudflare anyway? Were we really going to be able to put this to use, or would this just be a nice fancy front end to our website and nothing else?

On the other hand, it was a chance to use something new and shiny, and it is an expensive product, so of course we dove right in to play with it.

Initially we had some sync issues. A package repository is a mix of static data (binary and source packages) and dynamic data (package lists are updated every 6 hours). To make things worse, the cryptographic sealing of the metadata means that we need atomic updates of all the metadata (the signed top-level ‘Release’ file contains checksums of all the binary and source package lists).

The default behavior of a CDN is not appropriate for this purpose as it caches all files for a certain amount of time after they have been fetched for the first time. This means that you could have different versions of various metadata files in the cache, resulting in invalid checksums errors returned by apt-get. So we had to implement a few tweaks to make it work and reap the full benefits of Cloudflare’s CDN network.

First we added an “Expires” HTTP header to disable expiration of all files that will never change. Then we added another HTTP header to tag all metadata files so that we could manually purge those files from the CDN cache through an API call that we integrated at the end of the repository update procedure on our backend server.

With nginx in our backend, the configuration looks like this:

location /kali/dists/ {
    add_header Cache-Tag metadata,dists;
}
location /kali/project/trace/ {
    add_header Cache-Tag metadata,trace;
    expires 1h;
}
location /kali/pool/ {
    add_header Cache-Tag pool;
    location ~ \.(deb|udeb|dsc|changes|xz|gz|bz2)$ {
        expires max;
    }
}

The API call is a simple shell script launched by a hook of the repository mirroring script:

#!/bin/sh
curl -sS -X POST "https://api.cloudflare.com/client/v4/zones/xxxxxxxxxxx/purge_cache" \
    -H "Content-Type:application/json" \
    -H "X-Auth-Key:XXXXXXXXXXXXX" \
    -H "X-Auth-Email:[email protected]" \
    --data '{"tags":["metadata"]}'

With this simple yet powerful feature, we ensure that the CDN cache always contains consistent versions of the metadata files. Going further, we might want to configure Prefetching so that Cloudflare downloads all the package lists as soon as a user downloads the top-level ‘Release’ file.

In short, we were using this system in a way that was never intended, but it worked! This really reduced the load on our backend, as a single server could feed the entire CDN. Putting the files geographically close to users, allowing the classic apt dist-upgrade to occur much, much faster than ever before.

A huge benefit, and was not really a lot of work to set up. Sevki Hasirci was there with us the entire time as we worked through this process, ensuring any questions we had were answered promptly. A great win.

However, there was just one problem.

Looking at our logs, while the apt repo was working perfectly, our image distribution was not so great. None of those images were getting cached, and our origin server was dying.

Talking with Sevki, it turns out there were limits to how large of a file Cloudflare would cache. He upped our limit to the system capacity, but that still was not enough for how large some of our images are. At this point, we just assumed that was that--we could use this solution for the repo but for our image distribution it would not help. However, Sevki told us to wait a bit. He had a surprise in the works for us.

After some development time, Cloudflare pushed out an update to address our issue, allowing us to cache very large files. With that in place, everything just worked with no additional tweaking. Even items like partial downloads for users using download accelerators worked just fine. Amazing!

To show an example of what this translated into, let’s look at some graphs. Once the very large file support was added and we started to push out our images through Cloudflare, you could see that there is not a real increase in requests:

However, looking at Bandwidth there is a clear increase:

After it had been implemented for a while, we see a clear pattern.

This pushed us from around 80 TB a week when we had just the repo, to now around 430TB a month when its repo and images. As you can imagine, that's an amazing bandwidth savings for an open source project such as ours.

Performance is great, and with a cache hit rate of over 97% (amazingly high considering how often and frequently files in our repo changes), we could not be happier.

So what’s next? That's the question we are asking ourselves. This solution has worked so well, we are looking at other ways to leverage it, and there are a lot of options. One thing is for sure, we are not done with this.

Thanks to Cloudflare, Sevki, Justin, and Matthew for helping us along this path. It is fair to say this is the single largest contribution to Kali that we have received outside of the support by Offensive Security.

Support we received from Cloudflare was amazing. The Kali project and community thanks you immensely every time they update their distribution or download an image.

The Cloudflare Blog

Cloudflare Repositories FTW

QUIC restarts, slow problems: udpgrm to the rescue

A steam locomotive from 1993 broke my yarn test

Searching for the cause of hung tasks in the Linux kernel

Multi-Path TCP: revolutionizing connectivity, one path at a time