One of our large scale data infrastructure challenges here at Cloudflare is around providing HTTP traffic analytics to our customers. HTTP Analytics is available to all our customers via two options:
Processor problems have been in the news lately, due to the Meltdown and Spectre vulnerabilities. But generally, engineers writing software assume that computer hardware operates in a reliable, well-understood fashion, and that any problems lie on the software side of the software-hardware divide.
Six years ago when I joined Cloudflare the company had a capital F, about 20 employees, and a software stack that was mostly NGINX, PHP and PowerDNS (there was even a little Apache).
In a recent blog post we discussed epoll behavior causing uneven load among NGINX worker processes. We suggested a work around - the REUSEPORT socket option.
Scaling up TCP servers is usually straightforward. Most deployments start by using a single process setup. When the need arises more worker processes are added.