(Check for the latest updates at the end of this blog: Internet traffic started to come back at around July 9, 01:00 UTC, after 17 hours)
An outage at one of the largest ISPs in Canada, Rogers Communications, started earlier today, July 8, 2022, and is ongoing (eight hours and counting), and is impacting businesses and consumers. At the time of writing, we are seeing a very small amount of traffic from Rogers, but we are only seeing residual traffic, and nothing close to a full recovery to normal traffic levels.
Based on what we’re seeing and similar incidents in the past, we believe this is likely to be an internal error, not a cyber attack.
BGP is a mechanism to exchange routing information between networks on the Internet. The big routers that make the Internet work have huge, constantly updated lists of the possible routes that can be used to deliver each network packet to its final destination. Without BGP, the Internet routers wouldn't know what to do, and the Internet wouldn't exist.
The Internet is literally a network of networks, or for the maths fans, a graph, with each individual network a node in it, and the edges representing the interconnections. All of this is bound together by BGP. BGP allows one network (say Rogers) to advertise its presence to other networks that form the Internet. Rogers is not advertising its presence, so other networks can’t find Rogers network and so it is unavailable.
A BGP update message informs a router of changes made to a prefix (a group of IP addresses) advertisement or entirely withdraws the prefix. In this next chart, we can see that at 08:45 there was a withdrawal of prefixes from Rogers ASN.
Since then, at 14:30, attempts seem to be made to advertise their prefixes again. This maps to us seeing a slow increase in traffic again from Rogers’ end users.
The graph below, which shows the prefixes we were receiving from Rogers in Toronto, clearly shows the withdrawal of prefixes around 08:45, and the slow start in recovery at 14:30, with another round of withdraws at around 15:45.
Outages happen more regularly than people think. This week we did an Internet disruptions overview for Q2 2022 where you can get a better sense of that, and on how collaborative and interconnected the Internet (the network of networks) is. And not so long ago Facebook had an hours long outage where BGP updates showed Facebook itself disappearing from the Internet.
UPDATE: July 8, 2022, 23:00 UTC (19:00 EST)
The Rogers outage is still ongoing after 15 hours without clear signs of Internet traffic fully coming back.
At around 18:15 there was a small bump in traffic (only around 3% of the usual traffic at that time) that lasted for about 30 minutes, quickly returning to the ongoing outage. Here is the representation of that increase in AS812.
Here we can see an update on the BGP announcements. Rogers is still trying to get their services back online with new spikes in announcements to advertise their prefixes, but instants later it all seems to crumble again with the withdrawal of prefixes. The latest attempt was at 21:45 UTC:
It seems that the internal sessions in the Rogers core network flap, causing withdrawals when going down, and advertisements when coming up.
Rogers Senior Vice President, Kye Prigg, said a few hours ago in an interview that they haven’t identified the root cause for the outage and still don’t have an estimate on when the service will be restored.
UPDATE: July 9, 2022, 01:50 UTC (21:50 EST)
We are seeing partial recovery of traffic from the Rogers network, mostly after 00:15 UTC. The current traffic rate (01:50 UTC) is at about 17.5% of the rate from 24 hours before, an hour ago it was 13%. More than 17 hours have passed since the outage started.
We continue to see frequent BGP announcement and withdrawals originated from Rogers network, which indicates the core network flapping issue has not been fully resolved at this moment.
UPDATE: July 9, 2022, 09:00 UTC (05:00 EST)
Rogers traffic seems to be climbing back up to usual levels, for the past eight hours. Cloudflare's data shows that Saturday, July 9, at 08:40 UTC, there was around 76% of the previous day traffic at the same time. You can track it here.