November 27, 2020 12:00PM
A Byzantine failure in the real world
Post Mortem API Postgres OutageWhen we review design documents at Cloudflare, we are always on the lookout for Single Points of Failure (SPOFs). In this post, we present a timeline of a real-world incident, and how an interesting failure mode known as a Byzantine fault played a role in a cascading series of events....