March 19, 2020 12:57PM
Keepalives considered harmful
NGINX
Linux
Performance
You’d think keepalives would always be helpful, but turns out reality isn’t always what you expect it to be. It really helps if you read Why does one NGINX worker take all the load? first....
August 24, 2018 4:11PM
Introducing ebpf_exporter
Speed & Reliability
eBPF
Linux
Programming
Here at Cloudflare we use Prometheus to collect operational metrics. We run it on hundreds of servers and ingest millions of metrics per second to get insight into our network and provide the best possible service to our customers....
May 13, 2018 5:00PM
Tracing System CPU on Debian Stretch
Speed & Reliability
Kafka
eBPF
Linux
How an innocent OS upgrade triggered a cascade of issues and forced us into tracing Linux networking internals....
March 05, 2018 4:17PM
Squeezing the firehose: getting the most from Kafka compression
Compression
Speed & Reliability
Kafka
How Cloudflare was able to save hundreds of gigabits of network bandwidth and terabytes of storage from Kafka....
December 14, 2016 2:25PM
Manage Cloudflare records with Salt
Salt
GitHub
API
DNS
Reliability
We use Salt to manage our ever growing global fleet of machines. Salt is great for managing configurations and being the source of truth. We use it for remote command execution and for network automation tasks....
December 07, 2016 2:11PM
Debugging war story: the mystery of NXDOMAIN
DNS
Reliability
NXDOMAIN
The following blog post describes a debugging adventure on Cloudflare's Mesos-based cluster. This internal cluster is primarily used to process log file information so that Cloudflare customers have analytics, and for our systems that detect and respond to attacks....