A deep look at CVE-2015-5477 and how CloudFlare Virtual DNS customers are protected

Last week ISC published a patch for a critical remotely exploitable vulnerability in the BIND9 DNS server capable of causing a crash with a single packet.

CC BY 2.0 image by Ralph Aversen

The public summary tells us that a mistake in handling of queries for the TKEY type causes an assertion to fail, which in turn crashes the server. Since the assertion happens during the query parsing, there is no way to avoid it: it's the first thing that happens on receiving a packet, before any decision is made about what to do with it.

TKEY queries are used in the context of TSIG, a protocol DNS servers can use to authenticate to each other. They are special in that unlike normal DNS queries they include a “meta” record (of type TKEY) in the EXTRA/ADDITIONAL section of the message.

CC BY 2.0 image by Ralph Aversen

Since the exploit packet is now public, I thought we might take a dive and look at the vulnerable code. Let's start by taking a look at the output of a crashing instance:

03-Aug-2015 16:38:55.509 message.c:2352: REQUIRE(*name == ((void*)0)) failed, back trace
03-Aug-2015 16:38:55.510 #0 0x10001510d in assertion_failed()+0x5d
03-Aug-2015 16:38:55.510 #1 0x1001ee56a in isc_assertion_failed()+0xa
03-Aug-2015 16:38:55.510 #2 0x1000bc31d in dns_message_findname()+0x1ad
03-Aug-2015 16:38:55.510 #3 0x10017279c in dns_tkey_processquery()+0xfc
03-Aug-2015 16:38:55.510 #4 0x100016945 in ns_query_start()+0x695
03-Aug-2015 16:38:55.510 #5 0x100008673 in client_request()+0x18d3
03-Aug-2015 16:38:55.510 #6 0x1002125fe in run()+0x3ce
03-Aug-2015 16:38:55.510 exiting (due to assertion failure)
[1]    37363 abort (core dumped)  ./bin/named/named -f -c named.conf

This is extremely helpful--after all this is a controlled crash caused by a failed assertion--and tells us what failed and where: message.c:2352. Here's the excerpt.

// https://source.isc.org/git/bind9.git -- faa3b61 -- lib/dns/message.c

    isc_result_t
    dns_message_findname(dns_message_t *msg, dns_section_t section,
                 dns_name_t *target, dns_rdatatype_t type,
                 dns_rdatatype_t covers, dns_name_t **name,
                 dns_rdataset_t **rdataset)
    {
        dns_name_t *foundname;
        isc_result_t result;
    
        /*
         * XXX These requirements are probably too intensive, especially
         * where things can be NULL, but as they are they ensure that if
         * something is NON-NULL, indicating that the caller expects it
         * to be filled in, that we can in fact fill it in.
         */
        REQUIRE(msg != NULL);
        REQUIRE(VALID_SECTION(section));
        REQUIRE(target != NULL);
        if (name != NULL)
==>         REQUIRE(*name == NULL);

    [...]

What we have here is a function "dns_message_findname" that searches for an RRset with the given name and type in the given message section. It employs a really common C API: to get the results the caller passes pointers that will be filled in (dns_name_t **name, dns_rdataset_t **rdataset).

CC BY 2.0 image by Ralph Aversen

As the big comment ironically acknowledges, it's really strict when validating these pointers: if they don't point to (dns_name_t *)NULL the REQUIRE assertion will fail and the server will crash with no attempt at recovery. Code calling this function must take extra care to pass a pointer to a NULL dns_name_t *, which the function will fill in to return the found name.

In not-memory safe languages is not uncommon to crash when a programmer assertion is violated, because a program might not be able to cleanup its own memory after something that is not supposed to happen happens.

So we continue our investigation by climbing up the stack trace to find the illegal call. Next step is dns_tkey_processquery. Here is a simplified excerpt.

// https://source.isc.org/git/bind9.git -- faa3b61 -- lib/dns/tkey.c

isc_result_t
dns_tkey_processquery(dns_message_t *msg, dns_tkeyctx_t *tctx,
              dns_tsig_keyring_t *ring)
{
    isc_result_t result = ISC_R_SUCCESS;
    dns_name_t *qname, *name;
    dns_rdataset_t *tkeyset;

    /*
     * Interpret the question section.
     */
    result = dns_message_firstname(msg, DNS_SECTION_QUESTION);
    if (result != ISC_R_SUCCESS)
        return (DNS_R_FORMERR);

    qname = NULL;
    dns_message_currentname(msg, DNS_SECTION_QUESTION, &qname);

    /*
     * Look for a TKEY record that matches the question.
     */
    tkeyset = NULL;
    name = NULL;
    result = dns_message_findname(msg, DNS_SECTION_ADDITIONAL, qname,
                      dns_rdatatype_tkey, 0, &name, &tkeyset);
    if (result != ISC_R_SUCCESS) {
        /*
         * Try the answer section, since that's where Win2000
         * puts it.
         */
        if (dns_message_findname(msg, DNS_SECTION_ANSWER, qname,
                     dns_rdatatype_tkey, 0, &name,
                     &tkeyset) != ISC_R_SUCCESS) {
            result = DNS_R_FORMERR;
            tkey_log("dns_tkey_processquery: couldn't find a TKEY "
                 "matching the question");
            goto failure;
        }
    }

[...]

There are two dns_message_findname calls here. Since we are looking for the one that passes a dirty name we can ignore the first one which is preceded by an explicit name = NULL;.

The second call is more interesting. The same dns_name_t *name is reused without resetting it to NULL after the previous dns_message_findname call. This must be where the bug is.

CC BY 2.0 image by Ralph Aversen

Now the question is: when would dns_message_findname set name but not return ISC_R_SUCCESS (so that the if is satisfied)? Let's have a look at the full function body now.

// https://source.isc.org/git/bind9.git -- faa3b61 -- lib/dns/message.c

isc_result_t
dns_message_findname(dns_message_t *msg, dns_section_t section,
             dns_name_t *target, dns_rdatatype_t type,
             dns_rdatatype_t covers, dns_name_t **name,
             dns_rdataset_t **rdataset)
{
    dns_name_t *foundname;
    isc_result_t result;

    /*
     * XXX These requirements are probably too intensive, especially
     * where things can be NULL, but as they are they ensure that if
     * something is NON-NULL, indicating that the caller expects it
     * to be filled in, that we can in fact fill it in.
     */
    REQUIRE(msg != NULL);
    REQUIRE(VALID_SECTION(section));
    REQUIRE(target != NULL);
    if (name != NULL)
        REQUIRE(*name == NULL);
    if (type == dns_rdatatype_any) {
        REQUIRE(rdataset == NULL);
    } else {
        if (rdataset != NULL)
            REQUIRE(*rdataset == NULL);
    }

    result = findname(&foundname, target,
              &msg->sections[section]);

    if (result == ISC_R_NOTFOUND)
        return (DNS_R_NXDOMAIN);
    else if (result != ISC_R_SUCCESS)
        return (result);

    if (name != NULL)
        *name = foundname;

    /*
     * And now look for the type.
     */
    if (type == dns_rdatatype_any)
        return (ISC_R_SUCCESS);

    result = dns_message_findtype(foundname, type, covers, rdataset);
    if (result == ISC_R_NOTFOUND)
        return (DNS_R_NXRRSET);

    return (result);
}

As you can see dns_message_findname uses first findname to match the records with the target name, and then dns_message_findtype to match the target type. In between the two calls... *name = foundname! So if dns_message_findname can find a record with name == qname in DNS_SECTION_ADDITIONAL but then it turns out not to have type dns_rdatatype_tkey, name will be filled in and a failure returned. The second dns_message_findname call will trigger on the dirty name and... boom.

CC BY 2.0 image by Ralph Aversen

Indeed, the patch just adds name = NULL before the second call. (No, we couldn't have started our investigation from the patch; what's the fun in that!?)

diff --git a/lib/dns/tkey.c b/lib/dns/tkey.c
index 66210d5..34ad90b 100644
--- a/lib/dns/tkey.c
+++ b/lib/dns/tkey.c
@@ -654,6 +654,7 @@ dns_tkey_processquery(dns_message_t *msg, dns_tkeyctx_t *tctx,
 		 * Try the answer section, since that's where Win2000
 		 * puts it.
 		 */
+		name = NULL;
 		if (dns_message_findname(msg, DNS_SECTION_ANSWER, qname,
 					 dns_rdatatype_tkey, 0, &name,
 					 &tkeyset) != ISC_R_SUCCESS) {

To recap, here is the bug flow:

a query for type TKEY is received, dns_tkey_processquery is called to parse it
dns_message_findname is called a first time on the EXTRA section
a record with the same name as the query is found in the EXTRA section, causing name to be filled, but it's not a TKEY record, causing result != ISC_R_SUCCESS
dns_message_findname is called a second time to look in the ANS section, and it is passed the now dirty name reference
the assertion *name != NULL fails, BIND crashes

This bug was found with the amazing american fuzzy lop fuzzer by @jfoote_. A fuzzer is an automated tool that keeps feeding automatically mutated inputs to a target program until it crashes. You can see how it eventually stumbled upon the TKEY query + non-TKEY EXTRA RR combo and found this bug.

Virtual DNS customers have always been protected

Good news! CloudFlare Virtual DNS customers have always been protected from this attack, even if they run BIND. Our custom Go DNS server, RRDNS, parses and sanitizes all queries before forwarding them to the origin servers if needed.

Since Virtual DNS does not support TSIG and TKEY (which are meant to authenticate server-to-server traffic, not recursive lookups) it has no reason to relay EXTRA section records in queries, so it doesn't! That reduces the attack surface and indeed makes it impossible to exploit this vulnerability through Virtual DNS.

No special rules are in place to protect from this specific vulnerability: RRDNS always validates incoming packets, making sure they look like regular queries, and strips them down to the most simple form possible before relaying them.

The Cloudflare Blog

A deep look at CVE-2015-5477 and how CloudFlare Virtual DNS customers are protected

Virtual DNS customers have always been protected

Vulnerability transparency: strengthening security through responsible disclosure

QUIC restarts, slow problems: udpgrm to the rescue

Scaling with safety: Cloudflare's approach to global service health metrics and software releases

Some TXT about, and A PTR to, new DNS insights on Cloudflare Radar