Almost a year ago, we announced that we were going to stop answering DNS ANY queries. We were prompted by a number of factors:
The lack of legitimate ANY use.
The abundance of malicious ANY use.
The constant use of ANY queries in large DNS amplification DDoS attacks.
Additionally, we were about to launch Universal DNSSEC, and we could foresee the high cost of assembling ANY answers and providing DNSSEC-on-the-fly for those answers, especially when most of the time, those ANY answers were for malicious, illegitimate, clients.
Although we usually make a tremendous effort to maintain backwards compatibility across Internet protocols (recently, for example, continuing to support SHA-1-based SSL certificates), it was clear to us that the DNS ANY query was something that was better removed from the Internet than maintained for general use.
Our proposal at the time was to return an ERROR code to the querier telling them that ANY was not supported, and this sparked a robust discussion in the DNS protocol community. In this blog post, we’ll cover what has happened and what our final plan is.
Just before we published our blog a popular software started using ANY queries, to get all address records for a name -- something that ANY isn’t actually designed to do. The effect of this software was that our steady ANY query load had grown from few hundred per second to tens of thousands in a matter of days. Luckily, the software in question issued a revised version that did not use ANY, and our steady ANY query load returned to the old level.
CC BY-SA 2.0 image by Liam Quinn
The Conversation in the DNS Community
As that was happening, a lively discussion started to form among those in the DNS community. The first fundamental question that had to be answered was: “Does ANY mean ALL?”. That is, was an ANY query meant to be a way to receive all of the records in a zone for the query name?
Different people had different interpretations of ANY depending on the kind of DNS service they were providing. For example, it is nice as an operator of a DNS resolver to be able to ask your own resolver, “what is stored in the resolver memory/cache for a particular name?”. An ANY query with the right query flags can help answer that. On the other hand, an operator of an authoritative server does not need that functionality, as AXFR provides another reliable way to get that information. In short, the ANY query is a nice tool for people that are trying to understand what is going on in DNS resolution. Some community members argued that ANY should be a restricted query to a privileged few that have the need to know.
While ANY can be a nice tool to debug and expose inconsistencies in resolver caches, it’s still not a great tool: with more and more Anycast resolver clusters, there is no guarantee that two subsequent queries will hit the same resolver.
Why is answering ANY expensive for some DNS providers like CloudFlare?
Our in-house DNS server is optimized to provide dynamic answers to questions. For example, depending on how a CNAME is configured, our server may return the CNAME, fetch the real answer from the target of the CNAME (what we call CNAME Flattening), or provide CloudFlare addresses as answers. Thus for us to answer an ANY query, we need to compute all of the possible combinations just to know what to return.
Beyond that, our DNSSEC implementation signs answers on-the-fly. Thus, returning an answer with many different types of DNS records in the answer requires signing all of them at the edge. Thus providing support for ANY in the “traditional” sense had serious computational and response time implications. This is not unique to CloudFlare; we are not the only DNS implementation that has this high cost factor in answering ANY queries.
The use of ANY queries in DDoS attacks
In a recent paper by Akamai the authors draw the conclusion that DNSSEC is the main cause of large answers used for DDoS attacks. But looking at the packet capture they included in the paper, it’s clear the real cause of the large answers is that the attackers use ANY queries to maximize the amplification factor.
CC BY-SA 2.0 image by Vladimer Shioshvili
We regularly see attacks that attempt to use our powerful DNS system as a source of reflection. We have in response to this created sophisticated systems to detect the attacks and mitigate them in a responsible manner. Our deprecation of ANY is a key part of those protections. One of our main mantra is “do not return larger than needed answers”, exactly to help protect others on the Internet from amplification attacks.
Evolution of “Suppress ANY” in the DNS protocol community
Soon after we announced our planned deprecation of ANY, we submitted an Internet Draft to the IETF proposing to restrict ANY queries to authorized parties only. In the resulting discussion, it became clear that what mattered here was how we deprecated ANY. To make sure no system was adversely affected, we had to take into account how various applications and DNS implementations were using ANY.
In short, there are two main uses of ANY queries:
Some programs use ANY as a probabilistic optimization attempt to get the answers they need.
Some use ANY to debug DNS resolvers when things go wrong.
However, neither of those use cases are satisfied by the current ANY landscape. There is a lack of common behavior among resolvers as to how they treat answers that are cached as a result of an ANY query. Some will return this data when a more specific query matches, while other resolvers will fetch the exact requested data, even though they already have it in cache from an ANY query. This is because resolvers interpret RFC2181 section 5.4.1 (Ranking Data) in differing ways, and some resolvers were written without applying the rules from RFC2181 at all. Some resolvers will not forward ANY queries to authoritative servers if there exists a single RRset for that name in the resolver’s cache, and others will forward it unless they have a prior ANY answer in the cache. Furthermore, some resolvers will only reuse the results of an ANY query to answer other ANY queries, queries for all other types will result in a direct query for the type requested.
In many cases, the ANY query results in an answer that is too large to fit in the UDP packet size requested, resulting in a truncated answer, leading to a follow up via TCP. For a while, the DNS community believed that returning truncated answers would stop attacks, but in reality, that will only mitigate simple attacks using forged packets. In attacks that are reflected via open resolvers, returning truncated packets will not work because the open resolvers are happy to fall back to TCP if the UDP answer sets the truncate bit.
So, there is no common understanding of how the ANY query should be treated to minimize its amplification potential. To be fair, the ANY query is a special type of query called a meta-query i.e. it is not an actual type. Nevertheless, the community was divided into two camps: “ANY == ALL” and “ANY != ALL”.
The community was also further divided into groups of “ANY is ok for everyone” and “ANY should be restricted to “good” clients”. Our position from the beginning was that “ANY != ALL” and we were looking for a way to help curb the number of large amplified attacks on the Internet that used ANY.
Over a few months, we engaged in a number of experiments to see how different DNS systems reacted to different non-ANY responses to the ANY query. After a fair amount of experimentation and discussions with colleagues around the world, we decided on an approach that is recursive resolver centric. What we wanted to do was to give answers that are friendly to recursive resolvers, i.e. we give them something small that they can cache and return to repeated ANY queries. Returning an error to a recursive resolver was not a good option, as the resolver will just ask the next authoritative server and visit all the authoritative servers before giving up.
We also wanted to avoid guessing the intention of the originator of the query, which is why we did not follow one proposal to give out the A+AAAA+MX records or a CNAME if one existed. We do not like that, as the answer is bigger than it has to be and there is more data in the answer than the originator wanted.
For example, consider an email server that wants an MX record if one exists, but will fallback to an IP address if the MX does not exist. Instead, we decided to return what we call a “harmless” answer––an answer that is not useful for any application on the Internet. We selected an old DNS record type that is not used much anymore, but has the nice property that all test tools display as text: HINFO. This approach is documented in the current Internet Draft Refuse ANY draft that was adopted by the DNSOP working group of the IETF that handles DNS protocol issues.
As you can see below, when asked for ANY, we only return one HINFO record and the optional RRSIG that is only needed when the zone is signed. This record can be cached, and has the added benefit of being small and therefore reducing the amplification factor the attacker expected.
; <<>> DiG 9.8.3-P1 <<>> @ns2.p31.dynect.net. amazon.com. any +dnssec +norec ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 29671 ;; flags: qr aa; QUERY: 1, ANSWER: 16, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags: do; udp: 4096 ;; QUESTION SECTION: ;amazon.com. IN ANY ;; ANSWER SECTION: amazon.com. 900 IN SOA dns- external-master.amazon.com. root.amazon.com. 2010113317 180 60 3024000 60 amazon.com. 3600 IN NS pdns6.ultradns.co.uk. amazon.com. 3600 IN NS pdns1.ultradns.net. amazon.com. 3600 IN NS ns2.p31.dynect.net. amazon.com. 3600 IN NS ns3.p31.dynect.net. amazon.com. 3600 IN NS ns1.p31.dynect.net. amazon.com. 3600 IN NS ns4.p31.dynect.net. amazon.com. 60 IN A 220.127.116.11 amazon.com. 60 IN A 18.104.22.168 amazon.com. 60 IN A 22.214.171.124 amazon.com. 60 IN A 126.96.36.199 amazon.com. 60 IN A 188.8.131.52 amazon.com. 60 IN A 184.108.40.206 amazon.com. 900 IN MX 5 amazon-smtp.amazon.com. amazon.com. 900 IN TXT "spf2.0/pra include:spf1.amazon.com include:spf2.amazon.com include:amazonses.com -all" amazon.com. 900 IN TXT "v=spf1 include:spf1.amazon.com include:spf2.amazon.com include:amazonses.com -all" ;; Query time: 30 msec ;; SERVER: 220.127.116.11#53(18.104.22.168) ;; WHEN: Wed Apr 13 09:57:59 2016 ;; MSG SIZE rcvd: 565
Unsigned answer from large internet company is 565 bytes long.
In contrast a signed answer from CloudFlare.com is only 224 bytes long or less than ½ the unsigned answer above.
; <<>> DiG 9.8.3-P1 <<>> cloudflare.com any @ns3.cloudflare.com +dnssec +norec ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36238 ;; flags: qr aa; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags: do; udp: 512 ;; QUESTION SECTION: ;cloudflare.com. IN ANY ;; ANSWER SECTION: cloudflare.com. 3789 IN HINFO "Please stop asking for ANY" "See draft-ietf-dnsop-refuse-any" cloudflare.com. 3789 IN RRSIG HINFO 13 2 3789 20160414100147 20160412080147 35273 cloudflare.com. lGyCY7IC5sgHBfE95IJXDUS4diFjE5kq4vNMhhqP6+2+NyTQh1zAh1qw 3C710mFvvuCWe4VyRiqlu1jUzMnuLg== ;; Query time: 80 msec ;; SERVER: 2400:cb00:2049:1::a29f:21#53(2400:cb00:2049:1::a29f:21) ;; WHEN: Wed Apr 13 10:01:47 2016 ;; MSG SIZE rcvd: 224
We have been returning the HINFO answer for ANY queries since October 2015, with very few reports of problems in the field, besides a single Twitter rant about us not understanding DNS.
A few other DNS server vendors and DNS operators have followed our lead or adopted a similar line of defense. The latest example is the University of Cambridge which modified their BIND implementation to only return a single RRset for an ANY query. A common DNS server (NSD) has also been limiting ANY responses to only A+AAAA+MX types and/or CNAME, and a patch to return an even smaller answer has been proposed.
The moral is that the ANY query is not a useful tool for most DNS operators, but it is a wonderful tool if one is in the business of generating attacks against anyone else on the Internet. Answering ANY queries with a giant answer is not helping mitigate the plague of DoS volume attacks.
CloudFlare has taken a step to make the Internet a less hostile place and is leading by example. We strongly urge others to follow in our steps and neuter the amplification factor that a single DNS query can achieve. We all want to build a safer Internet, and neutering ANY query is one small step that everyone can take.