We extensively monitor our network and use multiple systems that give us visibility including external monitoring and internal alerts when things go wrong. One of the most useful systems is Grafana that allows us to quickly create arbitrary dashboards. And a heavy user of Grafana we are: at last count we had 645 different Grafana dashboards configured in our system!
grafana=> select count(1) from dashboard;
count
-------
645
(1 row)
This post is not about our Grafana systems though. It's about something we noticed a few days ago, while looking at one of those dashboards. We noticed this:
This chart shows the number of HTTP requests per second handled by our systems globally. You can clearly see multiple spikes, and this chart most definitely should not look like a porcupine! The spikes were large in scale - 500k to 1M HTTP requests per second. Something very strange was going on.
Tracing the spikes
Our intuition indicated an attack - but our attack mitigation systems didn't confirm it. We'd seen no major HTTP attacks at those times.
It would be bad if we were under such heavy HTTP attack and our mitigation systems didn't notice it. Without more ideas, we went back to one of our favorite debugging tools - tcpdump
.
The spikes happened every 80 minutes and lasted about 10 minutes. We waited, and tried to catch the offending traffic. Here is what the HTTP traffic looked like on the wire:
The client had sent some binary junk to our HTTP server on port 80; they weren't even sending a fake GET or POST line!
Our server politely responded with HTTP 400 error. This explains why it wasn't caught by our attack mitigation systems. Invalid HTTP requests don't trigger our HTTP DDoS mitigations - it makes no sense to mitigate traffic which is never accepted by NGINX in the first place!
The payload
At first glance the payload sent to HTTP servers seems random. A colleague of mine, Chris Branch, investigated and proved me wrong. The payload has patterns.
Let me show what's happening. Here are the first 24 bytes of the mentioned payload:
If you look closely, the pattern will start to emerge. Let's add some colors and draw it in not eight, but seven bytes per row:
This checkerboard-like pattern, is exhibited in most of the requests with payload sizes below 512 bytes.
Another engineer pointed out there appear to actually be two separate sequences generated in the same fashion. Starting with the a6
and the cb
take alternating bytes
a6 ef 39 82 cb 15 5e a7 f0 3a 83 cc 16 5f
cb 15 5e a7 f0 3a 83 cc 16 5f a8 f1 3b
Aligning that differently shows that the second sequence is essentially the same as the first:
a6 ef 39 82 cb 15 5e a7 f0 3a 83 cc 16 5f
cb 15 5e a7 f0 3a 83 cc 16 5f a8 f1 3b
Thinking of that as one sequence gets
a6 ef 39 82 cb 15 5e a7 f0 3a 83 cc 16 5f a8 f1 3b
Which is generated by starting at ef
and adding the following repeating sequence.
4a 49 49 4a 49 49 49
The 'random' binary junk is actually generated by some simple code.
The length distribution of the requests is also interesting. Here's the histogram showing the popularity of particular lengths of payloads.
About 80% of the junk requests we received had length of up to 511 bytes, uniformly distributed.
The remaining 20% had length uniformly distributed between 512 and 2047 bytes, with a few interesting spikes. For some reason lengths of 979, 1383 and 1428 bytes stand out. The rest of the distribution looks uniform.
The scale
The spikes were large. It takes a lot of firepower to generate a spike in our global HTTP statistics! On the first day the spikes reached about 600k junk requests per second. On second day the score went up to 1M rps. In total we recorded 37 spikes.
Geography
Unlike L3 attacks, L7 attacks require TCP/IP connections to be fully established. That means the source IP addresses are not spoofed and can be used to investigate the geographic distribution of attacking hosts.
The spikes were generated by IP addresses from all around the world. We recorded IP numbers from 4,912 distinct Autonomous Systems. Here are top ASN numbers by number of unique attacking IP addresses:
Percent of unique IP addresses seen:
21.51% AS36947 # AS de Algerie Telecom, Algeria
5.34% AS18881 # Telefonica Brasil S.A, Brasil
3.60% AS7738 # Telemar Norte Leste S.A., Brasil
3.48% AS27699 # Telefonica Brasil S.A, Brasil
3.37% AS28573 # CLARO S.A., Brasil
3.20% AS8167 # Brasil Telecom S/A, Brasil
2.44% AS2609 # Tunisia BackBone, Tunisia
2.22% AS6849 # PJSC "Ukrtelecom", Ukraine
1.77% AS3320 # Deutsche Telekom AG, Germany
1.73% AS12322 # Free SAS, France
1.73% AS8452 # TE-AS, Egypt
1.35% AS12880 # Information Technology Company, Iran
1.30% AS37705 # TOPNET, Tunisia
1.26% AS53006 # Algar Telecom S/A, Brasil
1.22% AS36903 # ASN du reseaux MPLs de Maroc Telecom, Morocco
... 4897 AS numbers below 1% of IP addresses.
You get the picture - the traffic was sourced all over the place, with bias towards South America and North Africa. Here is the country distribution of attacking IPs:
Percent of unique IP addresses seen:
31.76% BR
21.76% DZ
7.49% UA
5.73% TN
4.89% IR
3.96% FR
3.76% DE
2.09% EG
1.78% SK
1.36% MA
1.15% GB
1.05% ES
... 109 countries below 1% of IP addresses
The traffic was truly global and launched with IPs from 121 countries. This kind of globally distributed attack is where Cloudflare's Anycast network shines. During these spikes the load was nicely distributed across dozens of datacenters. Our datacenter in São Paulo absorbed the most traffic, roughly 4 times more traffic than the second in line - Paris. This chart shows how the traffic was distributed across many datacenters:
Unique IPs
During each of the spikes our systems recorded 200k unique source IP addresses sending us junk requests.
Normally we would conclude that whoever generated the attack controlled roughly 200k bots, and that's it. But these spikes were different. It seems the bots rotated IPs aggressively. Here is an example: during these 16 spikes we recorded a total count of a whopping 1.2M unique IP addresses attacking us.
This can be explained by bots churning through IP addresses. We believe that out of the estimated 200k bots, between 50k and 100k bots changed their IP addresses during the 80 minutes between attacks. This resulted in 1.2M unique IP addresses during the 16 spikes happening over 24 hours.
A botnet?
These spikes were unusual for a number of reasons.
They were generated by a large number of IP addresses. We estimate 200k concurrent bots.
The bots were rotating IP addresses aggressively.
The bots were from around the world with an emphasis on South America and North Africa.
The traffic generated was enormous, reaching 1M junk connections per second.
The spikes happened exactly every 80 minutes and lasted for 10 minutes.
The payload of the traffic was junk, not a usual HTTP request attack.
The payload had uniformly distributed payload sizes.
It's hard to draw conclusions, but we can imagine two possible scenarios. It is possible these spikes were an attack intended to break our HTTP servers.
A second possibility is that these spikes were legitimate connection attempts by some weird, obfuscated protocol. For some reason the clients were connecting to port 80/TCP and retried precisely every 80 minutes.
We are continuing our investigation. In the meantime we are looking for clues. Please do let us know if you have encountered this kind of TCP/IP payload. We are puzzled by these large spikes.
If you'd like to work on this type of problem we're hiring in London, San Francisco, Austin, Champaign and Singapore.
Update A Twitter user pointed out that the sequence a6 ef 39 82 cb 15 5e a7 f0 3a 83 cc 16 5f a8 f1 3b
appears in this set of test vectors so we contacted the author who was kind enough to reply and point us to the code that generated those vectors.
static void generateSimpleRawMaterial(unsigned char* data, unsigned int length, unsigned char seed1, unsigned int seed2)
{
unsigned int i;
for( i=0; i<length; i++) {
unsigned int iRolled = i*seed1;
unsigned char byte = (iRolled+length+seed2)%0xFF;
data[i] = byte;
}
}
Since we identified above that the difference between two bytes seemed to be 0x49 or 0x4a it's worth looking at the difference between bytes in this algorithm. Simplifying, bytes are generated from:
((i * seed1) + length + seed2)%0xFF
Ignoring the % 0xff
for the moment that's (i * seed1) + length + seed
. Taking the difference between two adjacent bytes (for i
and i+1
) gives a difference of just seed1
.
Thus in our case it's likely that seed1
is 0x49. It's fairly easy to end up with the following code to generate the sequence:
seed = 0x49
byte = 0xa6
do
byte = (seed + byte) % 0xff
done
One big mystery remaining is 'what's the 0x75 at the start of the junk data?'.
Yes, we're aware that porcupines have spines/quills not spikes. ↩︎