Recent headline grabbing DDoS attacks provoked heated debates in the DNS community. Everyone has strong opinions on how to harden DNS to avoid downtime in the future. Is it better to use a single DNS provider or multiple? What DNS TTL values are best? Does DNSSEC make you more or less exposed?
These are valid questions worth serious discussion, but tuning your own DNS server settings is not the full story. Together, as a community, we need to harden the DNS protocol itself. We need to prepare it to withstand the toughest DDoS attacks the future will surely bring. In this blog post I'll point out an obscure feature in the core DNS protocol. It is not practical to use this "hidden" feature for DDoS mitigation now, but with a small tweak it could become extremely useful. The feature is currently unused not due to protocol problems - it's unused because of the DNS Top Level Domain (TLD) operators' apathy. If it was working it would reduce DDoS recovery time for the DNS servers under attack.
The feature in question is: DNS TLD glue records. More specifically DNS TLD glue records with custom TTL values.
DNS glue is one of the least understood quirks in the DNS protocol. Allow me to explain why I think reducing glue TTL is a good idea.
But first: what is glue anyway?
DNS glue is a solution to "the chicken or the egg problem" that is inherent in DNS. It's easiest to explain it with a concrete example.
Imagine you want to resolve the
cloudflare.net domain. For that you ask your local recursive DNS server for the resolution. OK, but that doesn't answer the question, what does the resolver do?
For simplicity let's make a couple of assumptions:
- Our recursor doesn't have any data cached for
- However, it does know that the
.netTLD is handled by a number of nameservers, among them
a.gtld-servers.netwhich has the IP address
- We ignore the first steps and start our investigation by looking at the recursor when it queries the
cloudflare.net the recursor needs to figure out which nameservers host the
cloudflare.net data - or in DNS speak: which nameservers are authoritative for that zone?
To do so, the recursor asks the
.net nameserver. Let's assume we know that one of these is
184.108.40.206. The recursor will launch a query which we can simulate with this
$ dig cloudflare.net @220.127.116.11 [ output truncated for brevity ] ;; AUTHORITY SECTION: cloudflare.net. 172800 IN NS ns1.cloudflare.net. ;; ADDITIONAL SECTION: [ skipped for now ]
We politely asked one of
.net nameservers: where can I find
cloudflare.net? The answer is: I don't know, but I know who to ask! Go talk to
ns1.cloudflare.net, it knows all about the
This is called "a delegation".
.net told us to go away and ask
Hold on, but where is
ns1.cloudflare.net? What is its IP address? If we asked the
.net nameserver, it would tell us the same thing - go and talk to
As you can see, here is a chicken and egg problem. To resolve
cloudflare.net we need to resolve
ns1.cloudflare.net. To resolve
ns1.cloudflare.net we need to resolve
ns1.cloudflare.net, and so on.
This is where DNS glue comes in. I lied a bit in the previous terminal output, the resolution of
ns1.cloudflare.net is available in the response given by
.net nameserver. This time allow me to show the relevant "ADDITIONAL" section of the answer:
$ dig cloudflare.net @18.104.22.168 [ output truncated for brevity ] ;; AUTHORITY SECTION: cloudflare.net. 172800 IN NS ns1.cloudflare.net. ;; ADDITIONAL SECTION: ns1.cloudflare.net. 172800 IN A 22.214.171.124
To break the resolution loop we need the second bit of data in the answer - the ADDITIONAL SECTION. Here the
.net server says: by the way, in case you wondered where is
This is DNS glue. Conceptually it's a pretty weird invention. We are asking the authoritative nameservers of
.net zone, for the resolution of
cloudflare.net. In response we not only get the delegation information but also an address of the server. Think about it - it's as if a part of the
cloudflare.net zone was handled by the
.net TLD zone!
How far can this go? Can there be arbitrary resolutions stuck in the ADDITIONAL SECTION? Will this work?
$ dig cloudflare.net @126.96.36.199 [ output truncated for brevity ] ;; AUTHORITY SECTION: cloudflare.net. 172800 IN NS ns1.cloudflare.net. ;; ADDITIONAL SECTION: ns1.cloudflare.net. 172800 IN A 188.8.131.52 www.google.com 172800 IN A 184.108.40.206
The fun story is: it used to "work" and confuse recursors. This is precisely what the Kashpureff attack did in 1997.
This is a good old school DNS cache injection or cache poisoning attack. The recursor logic of interpreting DNS glue answers is pretty twisted. The details are poorly understood, and vary with every implementation. Conceptually the barrier between a valid glue record and cache injection is very thin. This is being actively discussed by the DNS gurus, see draft-fujiwara-dnsop-resolver-update-00 and draft-weaver-dnsext-comprehensive-resolver-00.
What's the problem?
We've shown what DNS glue is, how it works, and why it is needed in the DNS protocol. Frankly speaking, DNS glue is a pretty ingenious solution to solve a real struggle.
Let me now explain the problem. Let's take a look at the glue answer again:
;; ADDITIONAL SECTION: ns1.cloudflare.net. 172800 IN A 220.127.116.11
The problem is the TTL value. Here, you can see the TTL of that record is 172800 seconds = 48 hours. In normal situations a domain owner, in this case my colleague managing the
cloudflare.net domain, has a way to configure this value in a glue record. But 48 hours is not the value we intended to use! If you ask a
cloudflare.net authoritative nameserver for this record you get a different TTL that's much shorter:
$ dig ns1.cloudflare.net @18.104.22.168 ns1.cloudflare.net. 900 IN A 22.214.171.124
You can see that the authoritative nameserver claims this record is valid for only 900 seconds = 15 minutes, not 48 hours!
Where does this discrepancy come from?
The glue records are usually managed in some kind of panel exposed by the registrar. This is fine; in the end, we inject part of the
cloudflare.net namespace into the
.net zone. But here's the problem: while there is a way to set the glue IP address, there is no way to configure the TTL. The glue TTL is hardcoded to 48 hours by the TLD operators.
I strongly believe this is way too long and hurts aggressive DDoS mitigation techniques.
Had that DNS glue TTL been smaller, it would be possible to rotate the nameserver IPs during an attack. In fact, at Cloudflare we use this technique at the HTTP layer all the time.
During significant attacks we have the ability to promptly move customer traffic between IP addresses by changing the DNS resolution of our customer orange-clouded domains (those we proxy). This allows us to shift legitimate traffic off attacked IP addresses, and deploy aggressive DDoS mitigations on them. In extreme cases we can BGP null route the targeted IPs with little customer impact. Internally we call this technique "scattering".
"Scattering" on the HTTP layer is very effective against L3 attacks. It is also possible to do scattering with no impact to customers, because we serve DNS records with low DNS TTL values.
But "scattering" could also be done on the DNS authoritative layer! During heavy L3 attacks against one of our DNS servers we'd love to move legitimate traffic off that attacked IP address.
"Scattering" on the DNS authoritative layer could be a powerful mitigation technique. This will work great against attacks when packets from a botnet hit authoritative servers directly (as opposed to being reflected by legitimate DNS recursors). Unfortunately, it is impossible to do this "DNS auth scattering" because we don't have power to adjust the TLD glue TTL values. With the TTL stuck at 48 hours, changing the nameserver IP addresses dynamically is not an option.
I believe this should be fixed.
While I strongly believe that short DNS TTLs are a good thing, others disagree.
An often raised point is that short TTLs increase the load on DNS servers. This is certainly true, but as pointed out in this OARC presentation by the
.nl operators, the impact is minimal. DNS servers must be heavily over provisioned anyway to deal with attacks. Actually the
.nl operators have been serving 1 hour glue TTL since the beginning of 2016 without issues.
In this blog post, Bozhidar Bozhanov argues that short TTLs in general are undesirable.
What matters is that the glue TTL should be configurable.
It's hard to prove the effectiveness of the "DNS auth scattering" technique since glue TTL is hardcoded at the lengthy 48 hours, but we tried to check it anyway. For a test we added a glue record and measured how long it took to pick up its share of the traffic.
We performed the experiment on the
cloudflare.com domain. Here is a chart of traffic levels to two Cloudflare nameservers with glue already present: ns3 and ns6, and new one we just added glue for: ns6-bis.
We added glue at 2200 UTC one day. It is nicely visible that the traffic on this IP address gradually increased as the caches on recursors worldwide expired. The traffic seem to have reached levels comparable with other glue nameservers at about 1600 to 1800 the next day - around 8 hours later.
There is at least an 8 hour delay before a big chunk of DNS resolvers will pick up new glued IP. The maximum time for the full switch is, of course, 48 hours.
We must use every possible technique in order to make the Internet's DNS infrastructure more resilient against DDoS attacks. We may need to improve the core DNS protocol (aggressive NSEC caching), tune the defaults (advocate the use of low TTLs) and share advanced mitigation techniques (scattering).
In this article, I explained what DNS glue is, and why I believe that DNS TLD glue TTL values hardcoded at 48 hours are not helping with DDoS mitigation. I hope this article will serve as a call to action for relevant TLD operators. I believe the ability to adjust DNS glue TTLs is a simple yet effective way to make DNS infrastructure more reliable.