The Cloudflare Blog

TLS 1.3 explained by the Cloudflare Crypto Team at 33c3

Filippo Valsorda — Wed, 01 Feb 2017 14:57:00 GMT

Nick Sullivan and I gave a talk about TLS 1.3 at 33c3, the latest Chaos Communication Congress. The congress, attended by more that 13,000 hackers in Hamburg, has been one of the hallmark events of the security community for more than 30 years.

You can watch the recording below, or download it in multiple formats and languages on the CCC website.

The talk introduces TLS 1.3 and explains how it works in technical detail, why it is faster and more secure, and touches on its history and current status.

.fluid-width-video-wrapper { margin-bottom: 45px; }

The slide deck is also online.

This was an expanded and updated version of the internal talk previously transcribed on this blog.

TLS 1.3 hits Chrome and Firefox Stable

In related news, TLS 1.3 is reaching a percentage of Chrome and Firefox users this week, so websites with the Cloudflare TLS 1.3 beta enabled will load faster and more securely for all those new users.

You can enable the TLS 1.3 beta from the Crypto section of your control panel.

So you want to expose Go on the Internet

Filippo Valsorda — Mon, 26 Dec 2016 14:59:01 GMT

This piece was originally written for the Gopher Academy advent series. We are grateful to them for allowing us to republish it here.

Back when crypto/tls was slow and net/http young, the general wisdom was to always put Go servers behind a reverse proxy like NGINX. That's not necessary anymore!

At Cloudflare we recently experimented with exposing pure Go services to the hostile wide area network. With the Go 1.8 release, net/http and crypto/tls proved to be stable, performant and flexible.

However, the defaults are tuned for local services. In this articles we'll see how to tune and harden a Go server for Internet exposure.

`crypto/tls`

You're not running an insecure HTTP server on the Internet in 2016. So you need crypto/tls. The good news is that it's now really fast (as you've seen in a previous advent article), and its security track record so far is excellent.

The default settings resemble the Intermediate recommended configuration of the Mozilla guidelines. However, you should still set PreferServerCipherSuites to ensure safer and faster cipher suites are preferred, and CurvePreferences to avoid unoptimized curves: a client using CurveP384 would cause up to a second of CPU to be consumed on our machines.

&tls.Config{
	// Causes servers to use Go's default ciphersuite preferences,
	// which are tuned to avoid attacks. Does nothing on clients.
	PreferServerCipherSuites: true,
	// Only use curves which have assembly implementations
	CurvePreferences: []tls.CurveID{
		tls.CurveP256,
		tls.X25519, // Go 1.8 only
	},
}

If you can take the compatibility loss of the Modern configuration, you should then also set MinVersion and CipherSuites.

	MinVersion: tls.VersionTLS12,
	CipherSuites: []uint16{
		tls.TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,
		tls.TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
		tls.TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305, // Go 1.8 only
		tls.TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,   // Go 1.8 only
		tls.TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
		tls.TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,

		// Best disabled, as they don't provide Forward Secrecy,
		// but might be necessary for some clients
		// tls.TLS_RSA_WITH_AES_256_GCM_SHA384,
		// tls.TLS_RSA_WITH_AES_128_GCM_SHA256,
	},

Be aware that the Go implementation of the CBC cipher suites (the ones we disabled in Modern mode above) is vulnerable to the Lucky13 attack, even if partial countermeasures were merged in 1.8.

Final caveat, all these recommendations apply only to the amd64 architecture, for which fast, constant time implementations of the crypto primitives (AES-GCM, ChaCha20-Poly1305, P256) are available. Other architectures are probably not fit for production use.

Since this server will be exposed to the Internet, it will need a publicly trusted certificate. You can get one easily and for free thanks to Let's Encrypt and the golang.org/x/crypto/acme/autocert package’s GetCertificate function.

Don't forget to redirect HTTP page loads to HTTPS, and consider HSTS if your clients are browsers.

srv := &http.Server{
	ReadTimeout:  5 * time.Second,
	WriteTimeout: 5 * time.Second,
	Handler: http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
		w.Header().Set("Connection", "close")
		url := "https://" + req.Host + req.URL.String()
		http.Redirect(w, req, url, http.StatusMovedPermanently)
	}),
}
go func() { log.Fatal(srv.ListenAndServe()) }()

You can use the SSL Labs test to check that everything is configured correctly.

`net/http`

net/http is a mature HTTP/1.1 and HTTP/2 stack. You probably know how (and have opinions about how) to use the Handler side of it, so that's not what we'll talk about. We will instead talk about the Server side and what goes on behind the scenes.

Timeouts

Timeouts are possibly the most dangerous edge case to overlook. Your service might get away with it on a controlled network, but it will not survive on the open Internet, especially (but not only) if maliciously attacked.

Applying timeouts is a matter of resource control. Even if goroutines are cheap, file descriptors are always limited. A connection that is stuck, not making progress or is maliciously stalling should not be allowed to consume them.

A server that ran out of file descriptors will fail to accept new connections with errors like

http: Accept error: accept tcp [::]:80: accept: too many open files; retrying in 1s

A zero/default http.Server, like the one used by the package-level helpers http.ListenAndServe and http.ListenAndServeTLS, comes with no timeouts. You don't want that.

There are three main timeouts exposed in http.Server: ReadTimeout, WriteTimeout and IdleTimeout. You set them by explicitly using a Server:

srv := &http.Server{
    ReadTimeout:  5 * time.Second,
    WriteTimeout: 10 * time.Second,
    IdleTimeout:  120 * time.Second,
    TLSConfig:    tlsConfig,
    Handler:      serveMux,
}
log.Println(srv.ListenAndServeTLS("", ""))

ReadTimeout covers the time from when the connection is accepted to when the request body is fully read (if you do read the body, otherwise to the end of the headers). It's implemented in net/http by calling SetReadDeadline immediately after Accept.

The problem with a ReadTimeout is that it doesn't allow a server to give the client more time to stream the body of a request based on the path or the content. Go 1.8 introduces ReadHeaderTimeout, which only covers up to the request headers. However, there's still no clear way to do reads with timeouts from a Handler. Different designs are being discussed in issue #16100.

WriteTimeout normally covers the time from the end of the request header read to the end of the response write (a.k.a. the lifetime of the ServeHTTP), by calling SetWriteDeadline at the end of readRequest.

However, when the connection is over HTTPS, SetWriteDeadline is called immediately after Accept so that it also covers the packets written as part of the TLS handshake. Annoyingly, this means that (in that case only) WriteTimeout ends up including the header read and the first byte wait.

Similarly to ReadTimeout, WriteTimeout is absolute, with no way to manipulate it from a Handler (#16100).

Finally, Go 1.8 introduces IdleTimeout which limits server-side the amount of time a Keep-Alive connection will be kept idle before being reused. Before Go 1.8, the ReadTimeout would start ticking again immediately after a request completed, making it very hostile to Keep-Alive connections: the idle time would consume time the client should have been allowed to send the request, causing unexpected timeouts also for fast clients.

You should set Read, Write and Idle timeouts when dealing with untrusted clients and/or networks, so that a client can't hold up a connection by being slow to write or read.

For detailed background on HTTP/1.1 timeouts (up to Go 1.7) read my post on the Cloudflare blog.

HTTP/2

HTTP/2 is enabled automatically on any Go 1.6+ server if:

the request is served over TLS/HTTPS
Server.TLSNextProto is nil (setting it to an empty map is how you disable HTTP/2)
Server.TLSConfig is set and ListenAndServeTLS is used or
Serve is used and tls.Config.NextProtos includes "h2" (like []string{"h2", "http/1.1"}, since Serve is called too late to auto-modify the TLS Config)

HTTP/2 has a slightly different meaning since the same connection can be serving different requests at the same time, however, they are abstracted to the same set of Server timeouts in Go.

Sadly, ReadTimeout breaks HTTP/2 connections in Go 1.7. Instead of being reset for each request it's set once at the beginning of the connection and never reset, breaking all HTTP/2 connections after the ReadTimeout duration. It's fixed in 1.8.

Between this and the inclusion of idle time in ReadTimeout, my recommendation is to upgrade to 1.8 as soon as possible.

TCP Keep-Alives

If you use ListenAndServe (as opposed to passing a net.Listener to Serve, which offers zero protection by default) a TCP Keep-Alive period of three minutes will be set automatically. That will help with clients that disappear completely off the face of the earth leaving a connection open forever, but I’ve learned not to trust that, and to set timeouts anyway.

To begin with, three minutes might be too high, which you can solve by implementing your own tcpKeepAliveListener.

More importantly, a Keep-Alive only makes sure that the client is still responding, but does not place an upper limit on how long the connection can be held. A single malicious client can just open as many connections as your server has file descriptors, hold them half-way through the headers, respond to the rare keep-alives, and effectively take down your service.

Finally, in my experience connections tend to leak anyway until timeouts are in place.

ServeMux

Package level functions like http.Handle[Func] (and maybe your web framework) register handlers on the global http.DefaultServeMux which is used if Server.Handler is nil. You should avoid that.

Any package you import, directly or through other dependencies, has access to http.DefaultServeMux and might register routes you don't expect.

For example, if any package somewhere in the tree imports net/http/pprof clients will be able to get CPU profiles for your application. You can still use net/http/pprof by registering its handlers manually.

Instead, instantiate an http.ServeMux yourself, register handlers on it, and set it as Server.Handler. Or set whatever your web framework exposes as Server.Handler.

Logging

net/http does a number of things before yielding control to your handlers: Accepts the connections, runs the TLS Handshake, ...

If any of these go wrong a line is written directly to Server.ErrorLog. Some of these, like timeouts and connection resets, are expected on the open Internet. It's not clean, but you can intercept most of those and turn them into metrics by matching them with regexes from the Logger Writer, thanks to this guarantee:

Each logging operation makes a single call to the Writer's Write method.

To abort from inside a Handler without logging a stack trace you can either panic(nil) or in Go 1.8 panic(http.ErrAbortHandler).

Metrics

A metric you'll want to monitor is the number of open file descriptors. Prometheus does that by using the proc filesystem.

If you need to investigate a leak, you can use the Server.ConnState hook to get more detailed metrics of what stage the connections are in. However, note that there is no way to keep a correct count of StateActive connections without keeping state, so you'll need to maintain a map[net.Conn]ConnState.

Conclusion

The days of needing NGINX in front of all Go services are gone, but you still need to take a few precautions on the open Internet, and probably want to upgrade to the shiny, new Go 1.8.

Happy serving!

Dyn issues affecting joint customers

Filippo Valsorda — Fri, 21 Oct 2016 18:22:55 GMT

Today there is an ongoing, large scale Denial-of-Service attack directed against Dyn DNS. While Cloudflare services are operating normally, if you are using both Cloudflare and Dyn services, your website may be affected.

Specifically, if you are using CNAME records which point to a zone hosted on Dyn, our DNS queries directed to Dyn might fail making your website unavailable, and presenting a “1001” error message.

Some popular services that might rely on Dyn for part of their operations include GitHub Pages, Heroku, Shopify and AWS.

As a possible workaround, you might be able to update your Cloudflare DNS records from CNAMEs (referring to Dyn hosted records) to A/AAAA records specifying the origin IP of your website. This will allow Cloudflare to reach your origin without the need for an external DNS lookup.

Note that if you use different origin IP addresses, for example based on the geographical location, you may lose some of that functionality by using plain A/AAAA records. We recommend that you provide addresses for many of your different locations, so that load will be shared amongst them.

Customers with a CNAME setup (which means Cloudflare is not configured in your domain NS records) where the main zone is hosted on a Dyn service will be affected as well. You might be able to make Cloudflare your authoritative DNS provider by contacting support and asking to be changed to Full mode and then updating your nameservers at your registrar, but the change can take up to 48h to propagate.

Please note that the Cloudflare status page and support system might be affected by the ongoing attack, since they are hosted on third parties, as per industry best practices.

TLS nonce-nse

Filippo Valsorda — Wed, 12 Oct 2016 15:05:00 GMT

One of the base principles of cryptography is that you can't just encrypt multiple messages with the same key. At the very least, what will happen is that two messages that have identical plaintext will also have identical ciphertext, which is a dangerous leak. (This is similar to why you can't encrypt blocks with ECB.)

If you think about it, a pure encryption function is just like any other pure computer function: deterministic. Given the same set of inputs (key and message) it will always return the same output (the encrypted message). And we don't want an attacker to be able to tell that two encrypted messages came from the same plaintext.

The solution is the use of IVs (Initialization Vectors) or nonces (numbers used once). These are byte strings that are different for each encrypted message. They are the source of non-determinism that is needed to make duplicates indistinguishable. They are usually not secret, and distributed prepended to the ciphertext since they are necessary for decryption.

The distinction between IVs and nonces is controversial and not binary. Different encryption schemes require different properties to be secure: some just need them to never repeat, in which case we commonly refer to them as nonces; some also need them to be random, or even unpredictable, in which case we commonly call them IVs.

Nonces in TLS

TLS at its core is about encrypting a stream of packets, or more properly "records". The initial handshake takes care of authenticating the connection and generating the keys, but then it's up to the record layer to encrypt many records with that same key. Enter nonces.

Nonce management can be a hard problem, but TLS is near to the best case: keys are never reused across connections, and the records have sequence numbers that both sides keep track of. However, it took the protocol a few revisions to fully take advantage of this.

The resulting landscape is a bit confusing (including one or two attack names):

RC4 and stream ciphers

RC4 is a stream cipher, so it doesn't have to treat records separately. The cipher generates a continuous keystream which is XOR'd with the plaintexts as if they were just portions of one big message. Hence, there are no nonces.

RC4 is broken and was removed from TLS 1.3.

CBC in TLS 1.0

CBC in TLS 1.0 works similarly to RC4: the cipher is instantiated once, and then the records are encrypted as part of one continuous message.

Sadly that means that the IV for the next record is the last block of ciphertext of the previous record, which the attacker can observe. Being able to predict the IV breaks CBC security, and that led to the BEAST attack. BEAST is mitigated by splitting records in two, which effectively randomizes the IV, but this is a client-side fix, out of the server control.

CBC in TLS 1.1+

TLS 1.1 fixed BEAST by simply making IVs explicit, sending the IV with each record (with the network overhead that comes with that).

AES-CBC IVs are 16 bytes (128 bits), so using random bytes is sufficient to prevent collisions.

CBC has other nasty design issues and has been removed in TLS 1.3.

TLS 1.2 GCM

TLS 1.2 inherited the 1.1 explicit IVs. It also introduced AEADs like AES-GCM. The record nonce in 1.2 AES-GCM is a concatenation of a fixed per-connection IV (4 bytes, derived at the same time as the key) and an explicit per-record nonce (8 bytes, sent on the wire).

Since 8 random bytes is too short to guarantee uniqueness, 1.2 GCM implementations have to use the sequence number or a counter. If you are thinking "but what sense does it make to use an explicit IV, sent on the wire, which is just the sequence number that both parties know anyway", well... yeah.

Implementations not using a counter/sequence-based AES-GCM nonce were found to be indeed vulnerable by the "Nonce-Disrespecting Adversaries" paper.

TLS 1.3

TLS 1.3 finally took advantage of the sequential nature of TLS records and removed the free-form explicit IVs. It uses instead a combination of a fixed per-connection IV (derived at the same time as the key) and the sequence number, XORed—not concatenated.

This way the entire nonce length is random-looking, nonces can never be reused as the sequence number monotonically increases, and there is no network overhead.

ChaCha20-Poly1305

The ChaCha20-Poly1305 ciphersuite uses the same "fixed IV XORed with the sequence number" scheme of TLS 1.3 even when used in TLS 1.2

While 1.3 AEADs and 1.2 ChaCha20 use the same nonce scheme, when used in 1.2 ChaCha20 still puts the sequence number, type, version and length in the additional authenticated data. 1.3 makes all those either implicit or part of the encrypted payload.

To recap

RC4 is a stream cipher, so it has no per-record nonce.
CBC in TLS 1.0 used to work similarly to RC4. Sadly, that was vulnerable to BEAST.
TLS 1.1 fixed BEAST by simply making IVs explicit and random.
TLS 1.2 AES-GCM uses a concatenation of a fixed IV and an explicit sequential nonce.
TLS 1.3 finally uses a simple fixed IV XORed with the sequence number.
ChaCha20-Poly1305 uses the same scheme of TLS 1.3 even when used in TLS 1.2.

Nonce misuse resistance

In the introduction we used the case of a pair of identical message and key to illustrate the most intuitive issue of missing or reused nonces. However, depending on the cipher, other things can go wrong when the same nonce is reused, or is predictable.

A repeated nonce often breaks entirely the security properties of the connection. For example, AES-GCM leaks the authentication key altogether, allowing an attacker to fake packets and inject data.

As part of the trend of making cryptography primitives less dangerous to use for implementers, the research is focusing on mitigating the adverse consequences of nonce reuse. The property of these new schemes is called Nonce Reuse Resistance.

However, they still have to see wider adoption and standardization, which is why a solid protocol design like the one in TLS 1.3 is critical to prevent this class of attacks.

Does painting overviews of technical topics like this sound satisfying to you? We are hiring in London, Austin (TX), Champaign (IL), San Francisco and Singapore!

An overview of TLS 1.3 and Q&A

Filippo Valsorda — Fri, 23 Sep 2016 16:01:18 GMT

The CloudFlare London office hosts weekly internal Tech Talks (with free lunch picked by the speaker). My recent one was an explanation of the latest version of TLS, 1.3, how it works and why it's faster and safer.

You can watch the complete talk below or just read my summarized transcript.

Update: you might want to watch my more recent and extended 33c3 talk instead.

The Q&A session is open! Send us your questions about TLS 1.3 at tls13@cloudflare.com or leave them in the Disqus comments below and I'll answer them in an upcoming blog post.

.post-content iframe { margin: 0; }

Summarized transcript

To understand why TLS 1.3 is awesome, we need to take a step back and look at how TLS 1.2 works. In particular we will look at modern TLS 1.2, the kind that a recent browser would use when connecting to the CloudFlare edge.

The client starts by sending a message called the ClientHello that essentially says "hey, I want to speak TLS 1.2, with one of these cipher suites".

The server receives that and answers with a ServerHello that says "sure, let's speak TLS 1.2, and I pick this cipher suite".

Along with that the server sends its key share. The specifics of this key share change based on what cipher suite was selected. When using ECDHE, key shares are mixed with the Elliptic Curve Diffie Hellman algorithm.

The important part to understand is that for the client and server to agree on a cryptographic key, they need to receive each other's portion, or share.

Finally, the server sends the website certificate (signed by the CA) and a signature on portions of ClientHello and ServerHello, including the key share, so that the client knows that those are authentic.

The client receives all that, and then generates its own key share, mixes it with the server key share, and thus generates the encryption keys for the session.

Finally, the client sends the server its key share, enables encryption and sends a Finished message (which is a hash of a transcript of what happened so far). The server does the same: it mixes the key shares to get the key and sends its own Finished message.

At that point we are done, and we can finally send useful data encrypted on the connection.

Notice that this takes two round-trips between the client and the server before the HTTP request can be transferred. And round-trips on the Internet can be slow.

Enter TLS 1.3. While TLS 1.0, 1.1 and 1.2 are not that different, 1.3 is a big jump.

Most importantly, establishing a TLS 1.3 connection takes one less round-trip.

In TLS 1.3 a client starts by sending not only the ClientHello and the list of supported ciphers, but it also makes a guess as to which key agreement algorithm the server will choose, and sends a key share for that.

(Note: the video calls the key agreement algorithm "cipher suite". In the meantime the specification has been changed to disjoin supported cipher suites like AES-GCM-SHA256 and supported key agreements like ECDHE P-256.)

And that saves us a round trip, because as soon as the server selects the cipher suite and key agreement algorithm, it's ready to generate the key, as it already has the client key share. So it can switch to encrypted packets one whole round-trip in advance.

So the server sends the ServerHello, its key share, the certificate (now encrypted, since it has a key!), and already the Finished message.

The client receives all that, generates the keys using the key share, checks the certificate and Finished, and it's immediately ready to send the HTTP request, after only one round-trip. Which can be hundreds of milliseconds.

One existing way to speed up TLS connections is called resumption. It's what happens when the client has connected to that server before, and uses what they remember from the last time to cut short the handshake.

How this worked in TLS 1.2 is that servers would send the client either a Session ID or a Session Ticket. The former is just a reference number that the server can trace back to a session, while the latter is an encrypted serialized session which allows the server not to keep state.

The next time the client would connect, it would send the Session ID or Ticket in the ClientHello, and the server would go like "hey, I know you, we have agreed on a key already", skip the whole key shares dance, and jump straight to Finished, saving a round-trip.

So, we have a way to do 1-RTT connections in 1.2 if the client has connected before, which is very common. Then what does 1.3 gain us? When resumption is available, 1.3 allows us to do 0-RTT connections, again saving one round trip and ending up with no round trip at all.

If you have connected to a 1.3 server before you can immediately start sending encrypted data, like an HTTP request, without any round-trip at all, making TLS essentially zero overhead.

When a 1.3 client connects to a 1.3 server they agree on a resumption key (or PSK, pre-shared key), and the server gives the client a Session Ticket that will help it remember it. The Ticket can be an encrypted copy of the PSK—to avoid state—or a reference number.

The next time the client connects, it sends the Session Ticket in the ClientHello and then immediately, without waiting for any round trip, sends the HTTP request encrypted with the PSK. The server figures out the PSK from the Session Ticket and uses that to decrypt the 0-RTT data.

The client also sends a key share, so that client and server can switch to a new fresh key for the actual HTTP response and the rest of the connection.

0-RTT comes with a couple of caveats.

Since the PSK is not agreed upon with a fresh round of Diffie Hellman, it does not provide Forward Secrecy against a compromise of the Session Ticket key. That is, if in a year an attacker somehow obtains the Session Ticket key, it can decrypt the Session Ticket, obtain the PSK and decrypt the 0-RTT data the client sent (but not the rest of the connection).

This is why it's important to rotate often and not persist Session Ticket keys (CloudFlare rotates these keys hourly).

TLS 1.2 has never provided any Forward Secrecy against a compromise of the Session Ticket key at all, so even with 0-RTT 1.3 is an improvement upon 1.2.

The complete guide to Go net/http timeouts

Filippo Valsorda — Wed, 29 Jun 2016 13:09:27 GMT

When writing an HTTP server or client in Go, timeouts are amongst the easiest and most subtle things to get wrong: there’s many to choose from, and a mistake can have no consequences for a long time, until the network glitches and the process hangs.

HTTP is a complex multi-stage protocol, so there's no one-size fits all solution to timeouts. Think about a streaming endpoint versus a JSON API versus a Comet endpoint. Indeed, the defaults are often not what you want.

In this post I’ll take apart the various stages you might need to apply a timeout to, and look at the different ways to do it, on both the Server and the Client side.

SetDeadline

First, you need to know about the network primitive that Go exposes to implement timeouts: Deadlines.

Exposed by net.Conn with the Set[Read|Write]Deadline(time.Time) methods, Deadlines are an absolute time which when reached makes all I/O operations fail with a timeout error.

Deadlines are not timeouts. Once set they stay in force forever (or until the next call to SetDeadline), no matter if and how the connection is used in the meantime. So to build a timeout with SetDeadline you'll have to call it before every Read/Write operation.

You probably don't want to call SetDeadline yourself, and let net/http call it for you instead, using its higher level timeouts. However, keep in mind that all timeouts are implemented in terms of Deadlines, so they do NOT reset every time data is sent or received.

Server Timeouts

The "So you want to expose Go on the Internet" post has more information on server timeouts, in particular about HTTP/2 and Go 1.7 bugs.

It's critical for an HTTP server exposed to the Internet to enforce timeouts on client connections. Otherwise very slow or disappearing clients might leak file descriptors and eventually result in something along the lines of:

http: Accept error: accept tcp [::]:80: accept4: too many open files; retrying in 5ms

There are two timeouts exposed in http.Server: ReadTimeout and WriteTimeout. You set them by explicitly using a Server:

srv := &http.Server{
    ReadTimeout: 5 * time.Second,
    WriteTimeout: 10 * time.Second,
}
log.Println(srv.ListenAndServe())

However, when the connection is HTTPS, SetWriteDeadline is called immediately after Accept so that it also covers the packets written as part of the TLS handshake. Annoyingly, this means that (in that case only) WriteTimeout ends up including the header read and the first byte wait.

You should set both timeouts when you deal with untrusted clients and/or networks, so that a client can't hold up a connection by being slow to write or read.

Finally, there's http.TimeoutHandler. It’s not a Server parameter, but a Handler wrapper that limits the maximum duration of ServeHTTP calls. It works by buffering the response, and sending a 504 Gateway Timeout instead if the deadline is exceeded. Note that it is broken in 1.6 and fixed in 1.6.2.

http.ListenAndServe is doing it wrong

Incidentally, this means that the package-level convenience functions that bypass http.Server like http.ListenAndServe, http.ListenAndServeTLS and http.Serve are unfit for public Internet servers.

Those functions leave the Timeouts to their default off value, with no way of enabling them, so if you use them you'll soon be leaking connections and run out of file descriptors. I've made this mistake at least half a dozen times.

Instead, create a http.Server instance with ReadTimeout and WriteTimeout and use its corresponding methods, like in the example a few paragraphs above.

About streaming

Very annoyingly, there is no way of accessing the underlying net.Conn from ServeHTTP so a server that intends to stream a response is forced to unset the WriteTimeout (which is also possibly why they are 0 by default). This is because without net.Conn access, there is no way of calling SetWriteDeadline before each Write to implement a proper idle (not absolute) timeout.

Also, there's no way to cancel a blocked ResponseWriter.Write since ResponseWriter.Close (which you can access via an interface upgrade) is not documented to unblock a concurrent Write. So there's no way to build a timeout manually with a Timer, either.

Sadly, this means that streaming servers can't really defend themselves from a slow-reading client.

I submitted an issue with some proposals, and I welcome feedback there.

Client Timeouts

Client-side timeouts can be simpler or much more complex, depending which ones you use, but are just as important to prevent leaking resources or getting stuck.

The easiest to use is the Timeout field of http.Client. It covers the entire exchange, from Dial (if a connection is not reused) to reading the body.

c := &http.Client{
    Timeout: 15 * time.Second,
}
resp, err := c.Get("https://blog.filippo.io/")

Like the server-side case above, the package level functions such as http.Get use a Client without timeouts, so are dangerous to use on the open Internet.

For more granular control, there are a number of other more specific timeouts you can set:

net.Dialer.Timeout limits the time spent establishing a TCP connection (if a new one is needed).
http.Transport.TLSHandshakeTimeout limits the time spent performing the TLS handshake.
http.Transport.ResponseHeaderTimeout limits the time spent reading the headers of the response.
http.Transport.ExpectContinueTimeout limits the time the client will wait between sending the request headers when including an Expect: 100-continue and receiving the go-ahead to send the body. Note that setting this in 1.6 will disable HTTP/2 (DefaultTransport is special-cased from 1.6.2).

c := &http.Client{
    Transport: &http.Transport{
        Dial: (&net.Dialer{
                Timeout:   30 * time.Second,
                KeepAlive: 30 * time.Second,
        }).Dial,
        TLSHandshakeTimeout:   10 * time.Second,
        ResponseHeaderTimeout: 10 * time.Second,
        ExpectContinueTimeout: 1 * time.Second,
    }
}

As far as I can tell, there's no way to limit the time spent sending the request specifically. The time spent reading the request body can be controlled manually with a time.Timer since it happens after the Client method returns (see below for how to cancel a request).

Finally, new in 1.7, there's http.Transport.IdleConnTimeout. It does not control a blocking phase of a client request, but how long an idle connection is kept in the connection pool.

Note that a Client will follow redirects by default. http.Client.Timeout includes all time spent following redirects, while the granular timeouts are specific for each request, since http.Transport is a lower level system that has no concept of redirects.

Cancel and Context

net/http offers two ways to cancel a client request: Request.Cancel and, new in 1.7, Context.

Request.Cancel is an optional channel that when set and then closed causes the request to abort as if the Request.Timeout had been hit. (They are actually implemented through the same mechanism, and while writing this post I found a bug in 1.7 where all cancellations would be returned as timeout errors.)

We can use Request.Cancel and time.Timer to build a more granular timeout that allows streaming, pushing the deadline back every time we successfully read some data from the Body:

package main

import (
	"io"
	"io/ioutil"
	"log"
	"net/http"
	"time"
)

func main() {
	c := make(chan struct{})
	timer := time.AfterFunc(5*time.Second, func() {
		close(c)
	})

        // Serve 256 bytes every second.
	req, err := http.NewRequest("GET", "http://httpbin.org/range/2048?duration=8&chunk_size=256", nil)
	if err != nil {
		log.Fatal(err)
	}
	req.Cancel = c

	log.Println("Sending request...")
	resp, err := http.DefaultClient.Do(req)
	if err != nil {
		log.Fatal(err)
	}
	defer resp.Body.Close()

	log.Println("Reading body...")
	for {
		timer.Reset(2 * time.Second)
                // Try instead: timer.Reset(50 * time.Millisecond)
		_, err = io.CopyN(ioutil.Discard, resp.Body, 256)
		if err == io.EOF {
			break
		} else if err != nil {
			log.Fatal(err)
		}
	}
}

In the example above, we put a timeout of 5 seconds on the Do phases of the request, but then we spend at least 8 seconds reading the body in 8 rounds, each time with a timeout of 2 seconds. We could go on streaming like this forever without risk of getting stuck. If we were not to receive body data for more than 2 seconds, then io.CopyN would return net/http: request canceled.

In 1.7 the context package graduated to the standard library. There's a lot to learn about Contexts, but for our purposes you should know that they replace and deprecate Request.Cancel.

To use Contexts to cancel a request we just obtain a new Context and its cancel() function with context.WithCancel and create a Request bound to it with Request.WithContext. When we want to cancel the request, we cancel the Context by calling cancel() (instead of closing the Cancel channel):

ctx, cancel := context.WithCancel(context.TODO())
timer := time.AfterFunc(5*time.Second, func() {
	cancel()
})

req, err := http.NewRequest("GET", "http://httpbin.org/range/2048?duration=8&chunk_size=256", nil)
if err != nil {
	log.Fatal(err)
}
req = req.WithContext(ctx)

Contexts have the advantage that if the parent context (the one we passed to context.WithCancel) is canceled, ours will be, too, propagating the command down the entire pipeline.

This is all. I hope I didn't exceed your ReadDeadline!

If this kind of deep dive into the Go standard libraries sound entertaining to you, know that we are hiring in London, Austin (TX), Champaign (IL), San Francisco and Singapore.

Yet Another Padding Oracle in OpenSSL CBC Ciphersuites

Filippo Valsorda — Wed, 04 May 2016 12:20:42 GMT

Yesterday a new vulnerability has been announced in OpenSSL/LibreSSL. A padding oracle in CBC mode decryption, to be precise. Just like Lucky13. Actually, it’s in the code that fixes Lucky13.

It was found by Juraj Somorovsky using a tool he developed called TLS-Attacker. Like in the “old days”, it has no name except CVE-2016-2107. (I call it LuckyNegative20[^1])

It’s a wonderful example of a padding oracle in constant time code, so we’ll dive deep into it. But first, two quick background paragraphs. If you already know all about Lucky13 and how it's mitigated in OpenSSL jump to "Off by 20" for the hot and new.

If, before reading, you want to check that your server is safe, you can do it with this one-click online test.

TLS, CBC, and Mac-then-Encrypt

Very long story short, the CBC cipher suites in TLS have a design flaw: they first compute the HMAC of the plaintext, then encrypt plaintext || HMAC || padding || padding length using CBC mode. The receiving end is then left with the uncomfortable task of decrypting the message and checking HMAC and padding without revealing the padding length in any way. If they do, we call that a padding oracle, and a MitM can use it to learn the value of the last byte of any block, and by iteration often the entire message.

In other words, the CBC mode cipher suites are doomed by The Cryptographic Doom Principle. Sadly though, they are the only “safe” cipher suites to use with TLS prior to version 1.2, and they account for 26% of connections to the CloudFlare edge, so attacks against them like this one are still very relevant.

My colleague Nick Sullivan explained CBC padding oracles and their history in much more detail on this very blog. If you didn’t understand my coarse summary, please take the time to read his post as it’s needed to understand what goes wrong next.

Constant time programming crash course

The “solution” for the problem of leaking information about the padding length value is to write the entire HMAC and padding check code to run in perfectly constant time. What I once heard Daniel J. Bernstein call “binding your hands behind your back and seeing what you can do with just your nose”. As you might imagine it’s not easy.

The idea is that you can’t use ifs. So instead you store the results of your checks by doing bitwise ANDs with a result variable which you set to 1 before you begin, and at which you only look at the end of the entire operation. If any of the checks returned 0, then the result will be 0, otherwise it will be 1. This way the attacker only learns “everything including padding and HMAC is good” or “nope”.

Not only that, but loop iterations can only depend on public data: if for example the message is 32 bytes long then the padding can be at most 32-1-20=11 bytes, so you have to check 20 (HMAC-SHA1) + 11 bytes. To ignore the bytes you don’t actually have to check, you use a mask: a value generated with constant time operations that will be 0xff when the bytes are supposed to match and 0x00 when they are not. By ANDing the mask to the bitwise difference you get a value which is 0 when the values either match, or are not supposed to match.

This technique is crucial to this vulnerability, as it’s exactly its constant time, no-complaints nature that made the mistake possible.

The OpenSSL checking code was written after the discovery of Lucky13, and Adam Langley wrote a long blog post explaining how the generic version works. I glossed over many details here, in particular the incredible pain that is generating the HMAC of a variable-length message in constant time (which is where Lucky13 and Lucky Microseconds happened), so go read his incredible write-up to learn more.

Off by 20

Like any respectable vulnerability analysis, let’s start with the patch. We’ll focus on the SHA1 part for simplicity, but the SHA256 one is exactly the same.

The patch for LuckyMinus20 is one line in the OpenSSL function that performs AES-CBC decryption and checks HMAC and padding. All it is doing is flagging the check as failed if the padding length value is higher than the maximum it could possibly be.

              pad = plaintext[len - 1];
              maxpad = len - (SHA_DIGEST_LENGTH + 1);
              maxpad |= (255 - maxpad) >> (sizeof(maxpad) * 8 - 8);
              maxpad &= 255;
  
 +            ret &= constant_time_ge(maxpad, pad);
 +

This patch points straight at how to trigger the vulnerability, since it can’t have any side effect: it’s constant time code after all! It means that if we do what the patch is there to detect—send a padding length byte higher than maxpad—we can sometimes pass the HMAC/padding check (!!), and use that fact as an oracle.

maxpad is easily calculated: length of the plaintext, minus one byte of payload length, minus 20 bytes of HMAC-SHA1 (SHA_DIGEST_LENGTH), capped at 255. It’s a public value since it only depends on the encrypted data length. It’s also involved in deciding how many times to loop the HMAC/padding check, as we don’t want to go out of the bounds of the message buffer or waste time checking message bytes that couldn’t possibly be padding/HMAC.

So, how could pad > maxpad allow us pass the HMAC check fraudulently?

Let’s go back to our masks, this time thinking what happens if we send:

pad = maxpad + 1 = (len - 20 - 1) + 1 = (32 - 20 - 1) + 1 = 12

Uh oh. What happened there!? One byte of the HMAC checking mask fell out of the range we are actually checking! Nothing crashes because this is constant time code and it must tolerate these cases without any external hint. So, what if we push a little further…

There. The HMAC mask is now 0 for all checking loop iterations, so we don’t have to worry about having a valid signature at all anymore, and we’ll pass the MAC/padding check unconditionally if all bytes in the message are valid padding bytes—that is, equal to the padding length byte.

I like to box and label these kinds of “cryptanalytic functions” and call them capabilities. Here’s our capability:

We can discover if a message is made entirely of bytes with value n, where n >= maxpad + 20 by sending it to the server and observing an error different from BAD_MAC. This can only work with messages shorter than 256-20=236 bytes, as pad can be at most 255, being a 8-bit value.

NOTE: As Juraj clarifies, there is no need to do delicate timing statistics like in Lucky 13, because if the tampered message is sent at the time the Finished message should be sent, the server response (BAD_MAC or not) will be unencrypted.

We can use this capability as an oracle.

In other words, the entire trick is making the computed payload length "negative" by at least DIGEST_LENGTH so that the computed position of the HMAC mask is fully "outside" the message and the MAC ends up not being checked at all.

By the way, the fact that the affected function is called aesni_cbc_hmac_sha1_cipher explains why only servers with AES-NI instructions are vulnerable.

You can find a standalone, simplified version of the vulnerable function here, if you want to play with the debugger yourself.

From there to decryption

Ok, we have a “capability”. But how do we go from there to decryption? The “classic” way involves focusing on one byte that we don’t know, preceded by bytes we control (in this case, a string of 31). We then guess the unknown byte by XOR’ing the corresponding ciphertext byte in the previous block with a value n. We do this for many n. When the server replies “check passed”, we know that the mystery byte is 31 XOR n.

Remember that this is CBC mode, so we can cut out any consecutive sequence of blocks and present it as its own ciphertext, and values XOR’d to a given ciphertext byte end up XOR’d to the corresponding plaintext byte in the next block. The diagram below should make this clearer.

In this scenario, we inject JavaScript in a plain-HTTP page the user has loaded in a different tab and send as many AJAX requests as we want to our target domain. Usually we use the GET path as the known controlled bytes and the headers as the decryption target.

However, this approach doesn’t work in this case because we need two blocks of consecutive equal plaintext bytes, and if we touch the ciphertext in block x to help decrypt block x + 1, we mangle the plaintext of block x irreparably.

I’d like you to pause for a minute to realize that this is at all possible only because we:

Allow unauthenticated (plain-HTTP) websites—and so any MitM—to run code on our machines.
Allow JavaScript to send requests to third-party origins (i.e. other websites) which carry our private cookies. Incidentally, this is also what enables CSRF.

Back to our oracle, we can make it work the other way around: we align an unknown byte at the beginning of two blocks made of 31 (which we can send as POST body) and we XOR our n values with the first byte in the IV. Again, when the MAC check passes, we know that the target byte is 31 XOR n.

We iterate this process, moving all bytes one position to the right (for example by making the POST path one character longer), and XOR’ing the first byte of the IV with our n guesses, and the second byte with the number we now know will turn it into a 31.

And so on and so forth. This gets us to decrypt any 16 bytes that consistently appear just before a sequence of at least 31 attacker-controlled bytes and after a string of attacker-controlled length (for alignment).

It might be possible to extend the decryption to all blocks preceding the two controlled ones by carefully pausing the client request and using the early feedback, but I'm not sure it works. (Chances of me being wrong on full decryption capabilities are high, considering that an OpenSSL team member said they are not aware of such a technique.) UPDATE: we chatted a bit with Juraj, and no, it doesn't work.

But no further

One might think that since this boils down to a MAC check bypass, we might use it to inject unauthenticated messages into the connection. However, since we rely on making pad higher than maxpad + DIGEST_LENGTH, we can't avoid making the payload length (totlen - 1 - DIGEST_LENGTH - pad) negative.

pad > maxpad + DIGEST_LENGTH = (totlen - DIGEST_LENGTH - 1) + DIGEST_LENGTH = totlen - 1 > totlen - 1 - DIGEST_LENGTH
0 > totlen - 1 - DIGEST_LENGTH - pad = payload_len

Since length is stored as an unsigned integer, when it goes negative it wraps around to a very high number and triggers the "payload too long" error, which terminates the connection. Too bad... Good. I meant good :-)

This means, by the way, that successful attempts to exploit or test this vulnerability show up in OpenSSL logs as

SSL routines:SSL3_GET_RECORD:data length too long:s3_pkt.c:

Testing for the vulnerability

Detecting a vulnerable server is as easy as sending an encrypted message which decrypts to AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA, and checking if the TLS alert is DATA_LENGTH_TOO_LONG (vulnerable) or BAD_RECORD_MAC (not vulnerable).

It works because A is 65, which is more than 32 - 1.

There’s a simple implementation at https://github.com/FiloSottile/CVE-2016-2107 and an online version at https://filippo.io/CVE-2016-2107/

Conclusion

To sum up the impact: when the connection uses AES-CBC (for example because the server or the client don’t support TLS 1.2 yet) and the server’s processor supports AES-NI, a skilled MitM attacker can recover at least 16 bytes of anything it can get the client to send repeatedly just before attacker-controlled data (like HTTP Cookies, using JavaScript cross-origin requests).

A more skilled attacker than me might also be able to decrypt more than 16 bytes, but no one has shown that it’s possible yet.

All CloudFlare websites are protected from this vulnerability. Connections from the CloudFlare edge to origins use TLS 1.2 and AES-GCM if the server supports it, so are safe in that case. Customers supporting only AES-CBC should upgrade as soon as possible.

In closing, the cryptographic development community has shown that writing secure constant-time MtE decryption code is extremely difficult to do. We can keep patching this code, but the safer thing to do is to move to cryptographic primitives like AEAD that are designed not to require constant time acrobatics. Eventually, TLS 1.2 adoption will be high enough to give CBC the RC4 treatment. But we are not there yet.

Are you crazy enough that analyzing a padding oracle in a complex crypto protocol sounds exciting to you? We are hiring in London, San Francisco, Austin and Singapore, including cryptography roles.

Thanks to Anna Bernardi who helped with the analysis and the painful process of reading OpenSSL code. Thanks (in no particular order) to Nick Sullivan, Ryan Hodson, Evan Johnson, Joshua A. Kroll, John Graham-Cumming and Alessandro Ghedini for their help with this write-up.

[^1]: Although JGC insists it should be called FreezerBurn because -20, geddit?.

Building the simplest Go static analysis tool

Filippo Valsorda — Wed, 27 Apr 2016 15:01:15 GMT

Go native vendoring (a.k.a. GO15VENDOREXPERIMENT) allows you to freeze dependencies by putting them in a vendor folder in your project. The compiler will then look there before searching the GOPATH.

The only annoyance compared to using a per-project GOPATH, which is what we used to do, is that you might forget to vendor a package that you have in your GOPATH. The program will build for you, but it won't for anyone else. Back to the WFM times!

I decided I wanted something, a tool, to check that all my (non-stdlib) dependencies were vendored.

At first I thought of using go list, which Dave Cheney appropriately called a swiss army knife, but while it can show the entire recursive dependency tree (format .Deps), there's no way to know from the templating engine if a dependency is in the standard library.

We could just pass each output back into go list to check for .Standard, but I thought this would be a good occasion to build a very simple static analysis tool. Go's simplicity and libraries make it a very easy task, as you will see.

First, loading the program

We use golang.org/x/tools/go/loader to load the packages passed as arguments on the command line, including the test files based on a flag.

var conf loader.Config
for _, p := range flag.Args() {
    if *tests {
        conf.ImportWithTests(p)
    } else {
        conf.Import(p)
    }
}
prog, err := conf.Load()
if err != nil {
    log.Fatal(err)
}
for p := range prog.AllPackages {
    fmt.Println(p.Path())
}

With these few lines we already replicated go list -f {{ .Deps }}!

The only missing loading feature here is wildcard (./...) support. That code is in the go tool source and it's unexported. There's an issue about exposing it, but for now packages are just copy-pasting it. We'll use a packaged version of that code, github.com/kisielk/gotool:

for _, p := range gotool.ImportPaths(flag.Args()) {

Finally, since we are only interested in the dependency tree today we instruct the parser to only go as far as the imports statements and we ignore the resulting "not used" errors:

conf.ParserMode = parser.ImportsOnly
conf.AllowErrors = true
conf.TypeChecker.Error = func(error) {}

Then, the actual logic

We now have a loader.Program object, which holds references to various loader.PackageInfo objects, which in turn are a combination of package, AST and types information. All you need to perform any kind of complex analysis. Not that we are going to do that today :)

We'll just replicate the go list logic to recognize stdlib packages and remove the packages passed on the command line from the list:

initial := make(map[*loader.PackageInfo]bool)
for _, pi := range prog.InitialPackages() {
    initial[pi] = true
}

var packages []*loader.PackageInfo
for _, pi := range prog.AllPackages {
    if initial[pi] {
        continue
    }
    if len(pi.Files) == 0 {
        continue // virtual stdlib package
    }
    filename := prog.Fset.File(pi.Files[0].Pos()).Name()
    if !strings.HasPrefix(filename, build.Default.GOROOT) ||
        !isStandardImportPath(pi.Pkg.Path()) {
        packages = append(packages, pi)
    }
}

Then we just have to print a warning if any remaining package is not in a /vendor/ folder:

for _, pi := range packages {
    if strings.Index(pi.Pkg.Path(), "/vendor/") == -1 {
        fmt.Println("[!] dependency not vendored:", pi.Pkg.Path())
    }
}

Done! You can find the tool here: https://github.com/FiloSottile/vendorcheck

Go coverage with external tests

Filippo Valsorda — Tue, 19 Jan 2016 18:19:59 GMT

The Go test coverage implementation is quite ingenious: when asked to, the Go compiler will preprocess the source so that when each code portion is executed a bit is set in a coverage bitmap. This is integrated in the go test tool: go test -cover enables it and -coverprofile= allows you to write a profile to then inspect with go tool cover.

This makes it very easy to get unit test coverage, but there's no simple way to get coverage data for tests that you run against the main version of your program, like end-to-end tests.

The proper fix would involve adding -cover preprocessing support to go build, and exposing the coverage profile maybe as a runtime/pprof.Profile, but as of Go 1.6 there’s no such support. Here instead is a hack we've been using for a while in the test suite of RRDNS, our custom Go DNS server.

We create a dummy test that executes main(), we put it behind a build tag, compile a binary with go test -c -cover and then run only that test instead of running the regular binary.

Here's what the rrdns_test.go file looks like:

// +build testrunmain

package main

import "testing"

func TestRunMain(t *testing.T) {
	main()
}

We compile the binary like this

$ go test -coverpkg="rrdns/..." -c -tags testrunmain rrdns

And then when we want to collect coverage information, we execute this instead of ./rrdns (and run our test battery as usual):

$ ./rrdns.test -test.run "^TestRunMain$" -test.coverprofile=system.out

You must return from main() cleanly for the profile to be written to disk; in RRDNS we do that by catching SIGINT. You can still use command line arguments and standard input normally, just note that you will get two lines of extra output from the test framework.

Finally, since you probably also run unit tests, you might want to merge the coverage profiles with gocovmerge (from issue #6909):

$ go get github.com/wadey/gocovmerge
$ gocovmerge unit.out system.out > all.out
$ go tool cover -html all.out

If finding creative ways to test big-scale network services sounds fun, know that we are hiring in London, San Francisco and Singapore.

Creative foot-shooting with Go RWMutex

Filippo Valsorda — Thu, 29 Oct 2015 21:26:36 GMT

Hi, I'm Filippo and today I managed to surprise myself! (And not in a good way.)

I'm developing a new module ("filter" as we call them) for RRDNS, CloudFlare's Go DNS server. It's a rewrite of the authoritative module, the one that adds the IP addresses to DNS answers.

It has a table of CloudFlare IPs that looks like this:

type IPMap struct {
	sync.RWMutex
	M map[string][]net.IP
}

It's a global filter attribute:

type V2Filter struct {
	name       string
	IPTable    *IPMap
	// [...]
}

CC-BY-NC-ND image by Martin SoulStealer

The table changes often, so a background goroutine periodically reloads it from our distributed key-value store, acquires the lock (f.IPTable.Lock()), updates it and releases the lock (f.IPTable.Unlock()). This happens every 5 minutes.

Everything worked in tests, including multiple and concurrent requests.

Today we deployed to an off-production test machine and everything worked. For a few minutes. Then RRDNS stopped answering queries for the beta domains served by the new code.

What. _That worked on my laptop_™.

Here's the IPTable consumer function. You can probably spot the bug.

func (f *V2Filter) getCFAddr(...) (result []dns.RR) {
	f.IPTable.RLock()
	// [... append IPs from f.IPTable.M to result ...]
	return
}

f.IPTable.RUnlock() is never called. Whoops. But it's an RLock, so multiple getCFAddr calls should work, and only table reloading should break, no? Instead getCFAddr started blocking after a few minutes. To the docs!

To ensure that the lock eventually becomes available, a blocked Lock call excludes new readers from acquiring the lock. https://golang.org/pkg/sync/#RWMutex.Lock

So everything worked and RLocks piled up until the table reload function ran, then the pending Lock call caused all following RLock calls to block, breaking RRDNS answer generation.

In tests the table reload function never ran while answering queries, so getCFAddr kept piling up RLock calls but never blocked.

No customers were affected because A) the release was still being tested on off-production machines and B) no real customers run on the new code yet. Anyway it was a interesting way to cause a deferred deadlock.

In closing, there's probably space for a better tooling here. A static analysis tool might output a listing of all Lock/Unlock calls, and a dynamic analysis tool might report still [r]locked Mutex at the end of tests. (Or maybe these tools already exist, in which case let me know!)

Do you want to help (introduce :) and) fix bugs in the DNS server answering more than 50 billion queries every day? We are hiring in London, San Francisco and Singapore!

DNS parser, meet Go fuzzer

Filippo Valsorda — Thu, 06 Aug 2015 13:40:40 GMT

Here at CloudFlare we are heavy users of the github.com/miekg/dns Go DNS library and we make sure to contribute to its development as much as possible. Therefore when Dmitry Vyukov published go-fuzz and started to uncover tens of bugs in the Go standard library, our task was clear.

Hot Fuzz

Fuzzing is the technique of testing software by continuously feeding it inputs that are automatically mutated. For C/C++, the wildly successful afl-fuzz tool by Michał Zalewski uses instrumented source coverage to judge which mutations pushed the program into new paths, eventually hitting many rarely-tested branches.

go-fuzz applies the same technique to Go programs, instrumenting the source by rewriting it (like godebug does). An interesting difference between afl-fuzz and go-fuzz is that the former normally operates on file inputs to unmodified programs, while the latter asks you to write a Go function and passes inputs to that. The former usually forks a new process for each input, the latter keeps calling the function without restarting often.

There is no strong technical reason for this difference (and indeed afl recently gained the ability to behave like go-fuzz), but it's likely due to the different ecosystems in which they operate: Go programs often expose well-documented, well-behaved APIs which enable the tester to write a good wrapper that doesn't contaminate state across calls. Also, Go programs are often easier to dive into and more predictable, thanks obviously to GC and memory management, but also to the general community repulsion towards unexpected global states and side effects. On the other hand many legacy C code bases are so intractable that the easy and stable file input interface is worth the performance tradeoff.

Back to our DNS library. RRDNS, our in-house DNS server, uses github.com/miekgs/dns for all its parsing needs, and it has proved to be up to the task. However, it's a bit fragile on the edge cases and has a track record of panicking on malformed packets. Thankfully, this is Go, not BIND C, and we can afford to recover() panics without worrying about ending up with insane memory states. Here's what we are doing

func ParseDNSPacketSafely(buf []byte, msg *old.Msg) (err error) {
	defer func() {
		panicked := recover()

		if panicked != nil {
			err = errors.New("ParseError")
		}
	}()

	err = msg.Unpack(buf)

	return
}

We saw an opportunity to make the library more robust so we wrote this initial simple fuzzing function:

func Fuzz(rawMsg []byte) int {
    msg := &dns.Msg{}

    if unpackErr := msg.Unpack(rawMsg); unpackErr != nil {
        return 0
    }

    if _, packErr = msg.Pack(); packErr != nil {
        println("failed to pack back a message")
        spew.Dump(msg)
        panic(packErr)
    }

    return 1
}

To create a corpus of initial inputs we took our stress and regression test suites and used github.com/miekg/pcap to write a file per packet.

package main

import (
	"crypto/rand"
	"encoding/hex"
	"log"
	"os"
	"strconv"

	"github.com/miekg/pcap"
)

func fatalIfErr(err error) {
	if err != nil {
		log.Fatal(err)
	}
}

func main() {
	handle, err := pcap.OpenOffline(os.Args[1])
	fatalIfErr(err)

	b := make([]byte, 4)
	_, err = rand.Read(b)
	fatalIfErr(err)
	prefix := hex.EncodeToString(b)

	i := 0
	for pkt := handle.Next(); pkt != nil; pkt = handle.Next() {
		pkt.Decode()

		f, err := os.Create("p_" + prefix + "_" + strconv.Itoa(i))
		fatalIfErr(err)
		_, err = f.Write(pkt.Payload)
		fatalIfErr(err)
		fatalIfErr(f.Close())

		i++
	}
}

CC BY 2.0 image by JD Hancock

We then compiled our Fuzz function with go-fuzz, and launched the fuzzer on a lab server. The first thing go-fuzz does is minimize the corpus by throwing away packets that trigger the same code paths, then it starts mutating the inputs and passing them to Fuzz() in a loop. The mutations that don't fail (return 1) and expand code coverage are kept and iterated over. When the program panics, a small report (input and output) is saved and the program restarted. If you want to learn more about go-fuzz watch the author's GopherCon talk or read the README.

Crashes, mostly "index out of bounds", started to surface. go-fuzz becomes pretty slow and ineffective when the program crashes often, so while the CPUs burned I started fixing the bugs.

In some cases I just decided to change some parser patterns, for example reslicing and using len() instead of keeping offsets. However these can be potentially disrupting changes—I'm far from perfect—so I adapted the Fuzz function to keep an eye on the differences between the old and new, fixed parser, and crash if the new parser started refusing good packets or changed its behavior:

func Fuzz(rawMsg []byte) int {
    var (
        msg, msgOld = &dns.Msg{}, &old.Msg{}
        buf, bufOld = make([]byte, 100000), make([]byte, 100000)
        res, resOld []byte

        unpackErr, unpackErrOld error
        packErr, packErrOld     error
    )

    unpackErr = msg.Unpack(rawMsg)
    unpackErrOld = ParseDNSPacketSafely(rawMsg, msgOld)

    if unpackErr != nil && unpackErrOld != nil {
        return 0
    }

    if unpackErr != nil && unpackErr.Error() == "dns: out of order NSEC block" {
        // 97b0a31 - rewrite NSEC bitmap [un]packing to account for out-of-order
        return 0
    }

    if unpackErr != nil && unpackErr.Error() == "dns: bad rdlength" {
        // 3157620 - unpackStructValue: drop rdlen, reslice msg instead
        return 0
    }

    if unpackErr != nil && unpackErr.Error() == "dns: bad address family" {
        // f37c7ea - Reject a bad EDNS0_SUBNET family on unpack (not only on pack)
        return 0
    }

    if unpackErr != nil && unpackErr.Error() == "dns: bad netmask" {
        // 6d5de0a - EDNS0_SUBNET: refactor netmask handling
        return 0
    }

    if unpackErr != nil && unpackErrOld == nil {
        println("new code fails to unpack valid packets")
        panic(unpackErr)
    }

    res, packErr = msg.PackBuffer(buf)

    if packErr != nil {
        println("failed to pack back a message")
        spew.Dump(msg)
        panic(packErr)
    }

    if unpackErrOld == nil {

        resOld, packErrOld = msgOld.PackBuffer(bufOld)

        if packErrOld == nil && !bytes.Equal(res, resOld) {
            println("new code changed behavior of valid packets:")
            println()
            println(hex.Dump(res))
            println(hex.Dump(resOld))
            os.Exit(1)
        }

    }

    return 1
}

I was pretty happy about the robustness gain, but since we used the ParseDNSPacketSafely wrapper in RRDNS I didn't expect to find security vulnerabilities. I was wrong!

DNS names are made of labels, usually shown separated by dots. In a space saving effort, labels can be replaced by pointers to other names, so that if we know we encoded example.com at offset 15, www.example.com can be packed as www. + PTR(15). What we found is a bug in handling of pointers to empty names: when encountering the end of a name (0x00), if no label were read, "." (the empty name) was returned as a special case. Problem is that this special case was unaware of pointers, and it would instruct the parser to resume reading from the end of the pointed-to empty name instead of the end of the original name.

For example if the parser encountered at offset 60 a pointer to offset 15, and msg[15] == 0x00, parsing would then resume from offset 16 instead of 61, causing a infinite loop. This is a potential Denial of Service vulnerability.

A) Parse up to position 60, where a DNS name is found

| ... |  15  |  16  |  17  | ... |  58  |  59  |  60  |  61  |
| ... | 0x00 |      |      | ... |      |      | ->15 |      |

------------------------------------------------->     

B) Follow the pointer to position 15

| ... |  15  |  16  |  17  | ... |  58  |  59  |  60  |  61  |
| ... | 0x00 |      |      | ... |      |      | ->15 |      |

         ^                                        |
         ------------------------------------------      

C) Return a empty name ".", special case triggers

D) Erroneously resume from position 16 instead of 61

| ... |  15  |  16  |  17  | ... |  58  |  59  |  60  |  61  |
| ... | 0x00 |      |      | ... |      |      | ->15 |      |

                 -------------------------------->   

E) Rinse and repeat

We sent the fixes privately to the library maintainer while we patched our servers and we opened a PR once done. (Two bugs were independently found and fixed by Miek while we released our RRDNS updates, as it happens.)

Not just crashes and hangs

Thanks to its flexible fuzzing API, go-fuzz lends itself nicely not only to the mere search of crashing inputs, but can be used to explore all scenarios where edge cases are troublesome.

Useful applications range from checking output validation by adding crashing assertions to your Fuzz() function, to comparing the two ends of a unpack-pack chain and even comparing the behavior of two different versions or implementations of the same functionality.

For example, while preparing our DNSSEC engine for launch, I faced a weird bug that would happen only on production or under stress tests: NSEC records that were supposed to only have a couple bits set in their types bitmap would sometimes look like this

deleg.filippo.io.  IN  NSEC    3600    \000.deleg.filippo.io. NS WKS HINFO TXT AAAA LOC SRV CERT SSHFP RRSIG NSEC TLSA HIP TYPE60 TYPE61 SPF

The catch was that our "pack and send" code pools []byte buffers to reduce GC and allocation churn, so buffers passed to dns.msg.PackBuffer(buf []byte) can be "dirty" from previous uses.

var bufpool = sync.Pool{
    New: func() interface{} {
        return make([]byte, 0, 2048)
    },
}

[...]

    data := bufpool.Get().([]byte)
    defer bufpool.Put(data)

    if data, err = r.Response.PackBuffer(data); err != nil {

However, buf not being an array of zeroes was not handled by some github.com/miekgs/dns packers, including the NSEC rdata one, that would just OR present bits, without clearing ones that are supposed to be absent.

case `dns:"nsec"`:
    lastwindow := uint16(0)
    length := uint16(0)
    for j := 0; j < val.Field(i).Len(); j++ {
        t := uint16((fv.Index(j).Uint()))
        window := uint16(t / 256)
        if lastwindow != window {
            off += int(length) + 3
        }
        length = (t - window*256) / 8
        bit := t - (window * 256) - (length * 8)

        msg[off] = byte(window) // window #
        msg[off+1] = byte(length + 1) // octets length

        // Setting the bit value for the type in the right octet
--->    msg[off+2+int(length)] |= byte(1 << (7 - bit)) 

        lastwindow = window
    }
    off += 2 + int(length)
    off++
}

The fix was clear and easy: we benchmarked a few different ways to zero a buffer and updated the code like this

// zeroBuf is a big buffer of zero bytes, used to zero out the buffers passed
// to PackBuffer.
var zeroBuf = make([]byte, 65535)

var bufpool = sync.Pool{
    New: func() interface{} {
        return make([]byte, 0, 2048)
    },
}

[...]

    data := bufpool.Get().([]byte)
    defer bufpool.Put(data)
    copy(data[0:cap(data)], zeroBuf)

    if data, err = r.Response.PackBuffer(data); err != nil {

Note: a recent optimization turns zeroing range loops into memclr calls, so once 1.5 lands that will be much faster than copy().

But this was a boring fix! Wouldn't it be nicer if we could trust our library to work with any buffer we pass it? Luckily, this is exactly what coverage based fuzzing is good for: making sure all code paths behave in a certain way.

What I did then is write a Fuzz() function that would first parse a message, and then pack it to two different buffers: one filled with zeroes and one filled with 0xff. Any differences between the two results would signal cases where the underlying buffer is leaking into the output.

func Fuzz(rawMsg []byte) int {
    var (
        msg         = &dns.Msg{}
        buf, bufOne = make([]byte, 100000), make([]byte, 100000)
        res, resOne []byte

        unpackErr, packErr error
    )

    if unpackErr = msg.Unpack(rawMsg); unpackErr != nil {
        return 0
    }

    if res, packErr = msg.PackBuffer(buf); packErr != nil {
        return 0
    }

    for i := range res {
        bufOne[i] = 1
    }

    resOne, packErr = msg.PackBuffer(bufOne)
    if packErr != nil {
        println("Pack failed only with a filled buffer")
        panic(packErr)
    }

    if !bytes.Equal(res, resOne) {
        println("buffer bits leaked into the packed message")
        println(hex.Dump(res))
        println(hex.Dump(resOne))
        os.Exit(1)
    }

    return 1
}

I wish here, too, I could show a PR fixing all the bugs, but go-fuzz did its job even too well and we are still triaging and fixing what it finds.

Anyway, once the fixes are done and go-fuzz falls silent, we will be free to drop the buffer zeroing step without worry, with no need to audit the whole codebase!

Do you fancy fuzzing the libraries that serve 43 billion queries per day? We are hiring in London, San Francisco and Singapore!

A deep look at CVE-2015-5477 and how CloudFlare Virtual DNS customers are protected

Filippo Valsorda — Tue, 04 Aug 2015 10:36:24 GMT

Last week ISC published a patch for a critical remotely exploitable vulnerability in the BIND9 DNS server capable of causing a crash with a single packet.

CC BY 2.0 image by Ralph Aversen

The public summary tells us that a mistake in handling of queries for the TKEY type causes an assertion to fail, which in turn crashes the server. Since the assertion happens during the query parsing, there is no way to avoid it: it's the first thing that happens on receiving a packet, before any decision is made about what to do with it.

TKEY queries are used in the context of TSIG, a protocol DNS servers can use to authenticate to each other. They are special in that unlike normal DNS queries they include a “meta” record (of type TKEY) in the EXTRA/ADDITIONAL section of the message.

CC BY 2.0 image by Ralph Aversen

Since the exploit packet is now public, I thought we might take a dive and look at the vulnerable code. Let's start by taking a look at the output of a crashing instance:

03-Aug-2015 16:38:55.509 message.c:2352: REQUIRE(*name == ((void*)0)) failed, back trace
03-Aug-2015 16:38:55.510 #0 0x10001510d in assertion_failed()+0x5d
03-Aug-2015 16:38:55.510 #1 0x1001ee56a in isc_assertion_failed()+0xa
03-Aug-2015 16:38:55.510 #2 0x1000bc31d in dns_message_findname()+0x1ad
03-Aug-2015 16:38:55.510 #3 0x10017279c in dns_tkey_processquery()+0xfc
03-Aug-2015 16:38:55.510 #4 0x100016945 in ns_query_start()+0x695
03-Aug-2015 16:38:55.510 #5 0x100008673 in client_request()+0x18d3
03-Aug-2015 16:38:55.510 #6 0x1002125fe in run()+0x3ce
03-Aug-2015 16:38:55.510 exiting (due to assertion failure)
[1]    37363 abort (core dumped)  ./bin/named/named -f -c named.conf

This is extremely helpful--after all this is a controlled crash caused by a failed assertion--and tells us what failed and where: message.c:2352. Here's the excerpt.

// https://source.isc.org/git/bind9.git -- faa3b61 -- lib/dns/message.c

    isc_result_t
    dns_message_findname(dns_message_t *msg, dns_section_t section,
                 dns_name_t *target, dns_rdatatype_t type,
                 dns_rdatatype_t covers, dns_name_t **name,
                 dns_rdataset_t **rdataset)
    {
        dns_name_t *foundname;
        isc_result_t result;
    
        /*
         * XXX These requirements are probably too intensive, especially
         * where things can be NULL, but as they are they ensure that if
         * something is NON-NULL, indicating that the caller expects it
         * to be filled in, that we can in fact fill it in.
         */
        REQUIRE(msg != NULL);
        REQUIRE(VALID_SECTION(section));
        REQUIRE(target != NULL);
        if (name != NULL)
==>         REQUIRE(*name == NULL);

    [...]

What we have here is a function "dns_message_findname" that searches for an RRset with the given name and type in the given message section. It employs a really common C API: to get the results the caller passes pointers that will be filled in (dns_name_t **name, dns_rdataset_t **rdataset).

CC BY 2.0 image by Ralph Aversen

As the big comment ironically acknowledges, it's really strict when validating these pointers: if they don't point to (dns_name_t *)NULL the REQUIRE assertion will fail and the server will crash with no attempt at recovery. Code calling this function must take extra care to pass a pointer to a NULL dns_name_t *, which the function will fill in to return the found name.

In not-memory safe languages is not uncommon to crash when a programmer assertion is violated, because a program might not be able to cleanup its own memory after something that is not supposed to happen happens.

So we continue our investigation by climbing up the stack trace to find the illegal call. Next step is dns_tkey_processquery. Here is a simplified excerpt.

// https://source.isc.org/git/bind9.git -- faa3b61 -- lib/dns/tkey.c

isc_result_t
dns_tkey_processquery(dns_message_t *msg, dns_tkeyctx_t *tctx,
              dns_tsig_keyring_t *ring)
{
    isc_result_t result = ISC_R_SUCCESS;
    dns_name_t *qname, *name;
    dns_rdataset_t *tkeyset;

    /*
     * Interpret the question section.
     */
    result = dns_message_firstname(msg, DNS_SECTION_QUESTION);
    if (result != ISC_R_SUCCESS)
        return (DNS_R_FORMERR);

    qname = NULL;
    dns_message_currentname(msg, DNS_SECTION_QUESTION, &qname);

    /*
     * Look for a TKEY record that matches the question.
     */
    tkeyset = NULL;
    name = NULL;
    result = dns_message_findname(msg, DNS_SECTION_ADDITIONAL, qname,
                      dns_rdatatype_tkey, 0, &name, &tkeyset);
    if (result != ISC_R_SUCCESS) {
        /*
         * Try the answer section, since that's where Win2000
         * puts it.
         */
        if (dns_message_findname(msg, DNS_SECTION_ANSWER, qname,
                     dns_rdatatype_tkey, 0, &name,
                     &tkeyset) != ISC_R_SUCCESS) {
            result = DNS_R_FORMERR;
            tkey_log("dns_tkey_processquery: couldn't find a TKEY "
                 "matching the question");
            goto failure;
        }
    }

[...]

There are two dns_message_findname calls here. Since we are looking for the one that passes a dirty name we can ignore the first one which is preceded by an explicit name = NULL;.

The second call is more interesting. The same dns_name_t *name is reused without resetting it to NULL after the previous dns_message_findname call. This must be where the bug is.

CC BY 2.0 image by Ralph Aversen

Now the question is: when would dns_message_findname set name but not return ISC_R_SUCCESS (so that the if is satisfied)? Let's have a look at the full function body now.

// https://source.isc.org/git/bind9.git -- faa3b61 -- lib/dns/message.c

isc_result_t
dns_message_findname(dns_message_t *msg, dns_section_t section,
             dns_name_t *target, dns_rdatatype_t type,
             dns_rdatatype_t covers, dns_name_t **name,
             dns_rdataset_t **rdataset)
{
    dns_name_t *foundname;
    isc_result_t result;

    /*
     * XXX These requirements are probably too intensive, especially
     * where things can be NULL, but as they are they ensure that if
     * something is NON-NULL, indicating that the caller expects it
     * to be filled in, that we can in fact fill it in.
     */
    REQUIRE(msg != NULL);
    REQUIRE(VALID_SECTION(section));
    REQUIRE(target != NULL);
    if (name != NULL)
        REQUIRE(*name == NULL);
    if (type == dns_rdatatype_any) {
        REQUIRE(rdataset == NULL);
    } else {
        if (rdataset != NULL)
            REQUIRE(*rdataset == NULL);
    }

    result = findname(&foundname, target,
              &msg->sections[section]);

    if (result == ISC_R_NOTFOUND)
        return (DNS_R_NXDOMAIN);
    else if (result != ISC_R_SUCCESS)
        return (result);

    if (name != NULL)
        *name = foundname;

    /*
     * And now look for the type.
     */
    if (type == dns_rdatatype_any)
        return (ISC_R_SUCCESS);

    result = dns_message_findtype(foundname, type, covers, rdataset);
    if (result == ISC_R_NOTFOUND)
        return (DNS_R_NXRRSET);

    return (result);
}

As you can see dns_message_findname uses first findname to match the records with the target name, and then dns_message_findtype to match the target type. In between the two calls... *name = foundname! So if dns_message_findname can find a record with name == qname in DNS_SECTION_ADDITIONAL but then it turns out not to have type dns_rdatatype_tkey, name will be filled in and a failure returned. The second dns_message_findname call will trigger on the dirty name and... boom.

CC BY 2.0 image by Ralph Aversen

Indeed, the patch just adds name = NULL before the second call. (No, we couldn't have started our investigation from the patch; what's the fun in that!?)

diff --git a/lib/dns/tkey.c b/lib/dns/tkey.c
index 66210d5..34ad90b 100644
--- a/lib/dns/tkey.c
+++ b/lib/dns/tkey.c
@@ -654,6 +654,7 @@ dns_tkey_processquery(dns_message_t *msg, dns_tkeyctx_t *tctx,
 		 * Try the answer section, since that's where Win2000
 		 * puts it.
 		 */
+		name = NULL;
 		if (dns_message_findname(msg, DNS_SECTION_ANSWER, qname,
 					 dns_rdatatype_tkey, 0, &name,
 					 &tkeyset) != ISC_R_SUCCESS) {

To recap, here is the bug flow:

a query for type TKEY is received, dns_tkey_processquery is called to parse it
dns_message_findname is called a first time on the EXTRA section
a record with the same name as the query is found in the EXTRA section, causing name to be filled, but it's not a TKEY record, causing result != ISC_R_SUCCESS
dns_message_findname is called a second time to look in the ANS section, and it is passed the now dirty name reference
the assertion *name != NULL fails, BIND crashes

This bug was found with the amazing american fuzzy lop fuzzer by @jfoote_. A fuzzer is an automated tool that keeps feeding automatically mutated inputs to a target program until it crashes. You can see how it eventually stumbled upon the TKEY query + non-TKEY EXTRA RR combo and found this bug.

Virtual DNS customers have always been protected

Good news! CloudFlare Virtual DNS customers have always been protected from this attack, even if they run BIND. Our custom Go DNS server, RRDNS, parses and sanitizes all queries before forwarding them to the origin servers if needed.

Since Virtual DNS does not support TSIG and TKEY (which are meant to authenticate server-to-server traffic, not recursive lookups) it has no reason to relay EXTRA section records in queries, so it doesn't! That reduces the attack surface and indeed makes it impossible to exploit this vulnerability through Virtual DNS.

No special rules are in place to protect from this specific vulnerability: RRDNS always validates incoming packets, making sure they look like regular queries, and strips them down to the most simple form possible before relaying them.

Quick and dirty annotations for Go stack traces

Filippo Valsorda — Mon, 03 Aug 2015 11:26:04 GMT

CloudFlare’s DNS server, RRDNS, is entirely written in Go and typically runs tens of thousands goroutines. Since goroutines are cheap and Go I/O is blocking we run one goroutine per file descriptor we listen on and queue new packets for processing.

CC BY-SA 2.0 image by wiredforlego

When there are thousands of goroutines running, debug output quickly becomes difficult to interpret. For example, last week I was tracking down a problem with a file descriptor and wanted to know what its listening goroutine was doing. With 40k stack traces, good luck figuring out which one is having trouble.

Go stack traces include parameter values, but most Go types are (or are implemented as) pointers, so what you will see passed to the goroutine function is just a meaningless memory address.

We have a couple options to make sense of the addresses: get a heap dump at the same time as the stack trace and cross-reference the pointers, or have a debug endpoint that prints a goroutine/pointer -> IP map. Neither are seamless.

Underscore to the rescue

However, we know that integers are shown in traces, so what we did is first convert IPv4 addresses to their uint32 representation:

// addrToUint32 takes a TCPAddr or UDPAddr and converts its IP to a uint32.
// If the IP is v6, 0xffffffff is returned.
func addrToUint32(addr net.Addr) uint32 {
       var ip net.IP
       switch addr := addr.(type) {
       case *net.TCPAddr:
               ip = addr.IP
       case *net.UDPAddr:
               ip = addr.IP
       case *net.IPAddr:
               ip = addr.IP
       }
       if ip == nil {
               return 0
       }
       ipv4 := ip.To4()
       if ipv4 == nil {
               return math.MaxUint32
       }
       return uint32(ipv4[0])<<24 | uint32(ipv4[1])<<16 | uint32(ipv4[2])<<8 | uint32(ipv4[3])
}

And then pass the IPv4-as-uint32 to the listening goroutine as an _ parameter. Yes, as a parameter with name _; it's known as the blank identifier in Go.

// PacketUDPRead is a goroutine that listens on a specific UDP socket and reads
// in new requests
// The first parameter is the int representation of the listening IP address,
// and it's passed just so it will appear in stack traces
func PacketUDPRead(_ uint32, conn *net.UDPConn, ...) { ... }

go PacketUDPRead(addrToUint32(conn.LocalAddr()), conn, ...)

Now when we get a stack trace, we can just look at the first bytes, convert them back to dotted notation, and know on what IP the goroutine was listening.

goroutine 42 [IO wait]:
	[...]
	/.../request.go:195 +0x5d
rrdns/core.PacketUDPRead(0xc27f000001, 0x2b6328113ad8, 0xc20801ecc0, 0xc208044308, 0xc208e99280, 0xc208ad8180, 0x12a05f200)
	/.../server.go:119 +0x35a
created by rrdns/core.PacketIO
	/.../server.go:230 +0x8be

0xc27f000001 -> remove alignment byte -> 0x7f000001 -> 127.0.0.1

Obviously you can do the same with any piece of information you can represent as an int.

Are you interested in taming the goroutines that run the web? We're hiring in London, San Francisco and Singapore!

Setting Go variables from the outside

Filippo Valsorda — Wed, 01 Jul 2015 13:26:37 GMT

CloudFlare's DNS server, RRDNS, is written in Go and the DNS team used to generate a file called version.go in our Makefile. version.go looked something like this:

// THIS FILE IS AUTOGENERATED BY THE MAKEFILE. DO NOT EDIT.

// +build	make

package version

var (
	Version   = "2015.6.2-6-gfd7e2d1-dev"
	BuildTime = "2015-06-16-0431 UTC"
)

and was used to embed version information in RRDNS. It was built inside the Makefile using sed and git describe from a template file. It worked, but was pretty ugly.

Today we noticed that another Go team at CloudFlare, the Data team, had a much smarter way to bake version numbers into binaries using the -X linker option.

The -X Go linker option, which you can set with -ldflags, sets the value of a string variable in the Go program being linked. You use it like this: -X main.version 1.0.0.

A simple example: let's say you have this source file saved as hello.go.

package main

import "fmt"

var who = "World"

func main() {
    fmt.Printf("Hello, %s.\n", who)
}

Then you can use go run (or other build commands like go build or go install) with the -ldflags option to modify the value of the who variable:

$ go run hello.go
Hello, World.
$ go run -ldflags="-X main.who CloudFlare" hello.go
Hello, CloudFlare.

The format is importpath.name string, so it's possible to set the value of any string anywhere in the Go program, not just in main. Note that from Go 1.5 the syntax has changed to importpath.name=string. The old style is still supported but the linker will complain.

I was worried this would not work with external linking (for example when using cgo) but as we can see with -ldflags="-linkmode=external -v" the Go linker runs first anyway and takes care of our -X.

$ go build -x -ldflags="-X main.who CloudFlare -linkmode=external -v" hello.go
WORK=/var/folders/v8/xdj2snz51sg2m2bnpmwl_91c0000gn/T/go-build149644699
mkdir -p $WORK/command-line-arguments/_obj/
cd /Users/filippo/tmp/X
/usr/local/Cellar/go/1.4.2/libexec/pkg/tool/darwin_amd64/6g -o $WORK/command-line-arguments.a -trimpath $WORK -p command-line-arguments -complete -D _/Users/filippo/tmp/X -I $WORK -pack ./hello.go
cd .
/usr/local/Cellar/go/1.4.2/libexec/pkg/tool/darwin_amd64/6l -o hello -L $WORK -X main.hi hi -linkmode=external -v -extld=clang $WORK/command-line-arguments.a
# command-line-arguments
HEADER = -H1 -T0x2000 -D0x0 -R0x1000
searching for runtime.a in $WORK/runtime.a
searching for runtime.a in /usr/local/Cellar/go/1.4.2/libexec/pkg/darwin_amd64/runtime.a
 0.06 deadcode
 0.07 pclntab=284969 bytes, funcdata total 49800 bytes
 0.07 dodata
 0.08 symsize = 0
 0.08 symsize = 0
 0.08 reloc
 0.09 reloc
 0.09 asmb
 0.09 codeblk
 0.09 datblk
 0.09 dwarf
 0.09 sym
 0.09 headr
host link: clang -m64 -gdwarf-2 -Wl,-no_pie,-pagezero_size,4000000 -o hello -Qunused-arguments /var/folders/v8/xdj2snz51sg2m2bnpmwl_91c0000gn/T//go-link-mFNNCD/000000.o /var/folders/v8/xdj2snz51sg2m2bnpmwl_91c0000gn/T//go-link-mFNNCD/000001.o /var/folders/v8/xdj2snz51sg2m2bnpmwl_91c0000gn/T//go-link-mFNNCD/go.o -g -O2 -g -O2 -lpthread
 0.17 cpu time
33619 symbols
64 sizeof adr
216 sizeof prog
23412 liveness data

Do you want to work next to Go developers that can always make you learn a new trick? We are hiring in London, San Francisco and Singapore.

Go has a debugger—and it's awesome!

Filippo Valsorda — Thu, 18 Jun 2015 11:14:00 GMT

Something that often, uh... bugs[1] Go developers is the lack of a proper debugger. Sure, builds are ridiculously fast and easy, and println(hex.Dump(b)) is your friend, but sometimes it would be nice to just set a breakpoint and step through that endless if chain or print a bunch of values without recompiling ten times.

CC BY 2.0 image by Carl Milner

You could try to use some dirty gdb hacks that will work if you built your binary with a certain linker and ran it on some architectures when the moon was in a waxing crescent phase, but let's be honest, it isn't an enjoyable experience.

Well, worry no more! godebug is here!

godebug is an awesome cross-platform debugger created by the Mailgun team. You can read their introduction for some under-the-hood details, but here's the cool bit: instead of wrestling with half a dozen different ptrace interfaces that would not be portable, godebug rewrites your source code and injects function calls like godebug.Line on every line, godebug.Declare at every variable declaration, and godebug.SetTrace for breakpoints (i.e. wherever you type _ = "breakpoint").

I find this solution brilliant. What you get out of it is a (possibly cross-compiled) debug-enabled binary that you can drop on a staging server just like you would with a regular binary. When a breakpoint is reached, the program will stop inline and wait for you on stdin. It's the single-binary, zero-dependencies philosophy of Go that we love applied to debugging. Builds everywhere, runs everywhere, with no need for tools or permissions on the server. It even compiles to JavaScript with gopherjs (check out the Mailgun post above—show-offs ;) ).

You might ask, "But does it get a decent runtime speed or work with big applications?" Well, the other day I was seeing RRDNS—our in-house Go DNS server—hit a weird branch, so I placed a breakpoint a couple lines above the if in question, recompiled the whole of RRDNS with godebug instrumentation, dropped the binary on a staging server, and replayed some DNS traffic.

filippo@staging:~$ ./rrdns -config config.json
-> _ = "breakpoint"
(godebug) l

    q := r.Query.Question[0]

--> _ = "breakpoint"

    if !isQtypeSupported(q.Qtype) {
        return
(godebug) n
-> if !isQtypeSupported(q.Qtype) {
(godebug) q
dns.Question{Name:"filippo.io.", Qtype:0x1, Qclass:0x1}
(godebug) c

Boom. The request and the debug log paused (make sure to terminate any timeout you have in your tools), waiting for me to step through the code.

Sold yet? Here's how you use it: simply run godebug {build|run|test} instead of go {build|run|test}. We adapted godebug to resemble the go tool as much as possible. Remember to use -instrument if you want to be able to step into packages that are not main.

For example, here is part of the RRDNS Makefile:

bin/rrdns:
ifdef GODEBUG
	GOPATH="${PWD}" go install github.com/mailgun/godebug
	GOPATH="${PWD}" ./bin/godebug build -instrument "${GODEBUG}" -o bin/rrdns rrdns
else
	GOPATH="${PWD}" go install rrdns
endif

test:
ifdef GODEBUG
	GOPATH="${PWD}" go install github.com/mailgun/godebug
	GOPATH="${PWD}" ./bin/godebug test -instrument "${GODEBUG}" rrdns/...
else
	GOPATH="${PWD}" go test rrdns/...
endif

Debugging is just a make bin/rrdns GODEBUG=rrdns/... away.

This tool is still young, but in my experience, perfectly functional. The UX could use some love if you can spare some time (as you can see above it's pretty spartan), but it should be easy to build on what's there already.

About source rewriting

Before closing, I'd like to say a few words about the technique of source rewriting in general. It powers many different Go tools, like test coverage, fuzzing and, indeed, debugging. It's made possible primarily by Go’s blazing-fast compiles, and it enables amazing cross-platform tools to be built easily.

However, since it's such a handy and powerful pattern, I feel like there should be a standard way to apply it in the context of the build process. After all, all the source rewriting tools need to implement a subset of the following features:

Wrap the main function
Conditionally rewrite source files
Keep global state

Why should every tool have to reinvent all the boilerplate to copy the source files, rewrite the source, make sure stale objects are not used, build the right packages, run the right tests, and interpret the CLI..? Basically, all of godebug/cmd.go. And what about gb, for example?

I think we need a framework for Go source code rewriting tools. (Spoiler, spoiler, ...)

If you’re interested in working on Go servers at scale and developing tools to do it better, remember we’re hiring in London, San Francisco, and Singapore!

I'm sorry. ↩︎

Logjam: the latest TLS vulnerability explained

Filippo Valsorda — Wed, 20 May 2015 23:52:53 GMT

Image: "Logjam" as interpreted by @0xabad1dea

Yesterday, a group from INRIA, Microsoft Research, Johns Hopkins, the University of Michigan, and the University of Pennsylvania published a deep analysis of the Diffie-Hellman algorithm as used in TLS and other protocols. This analysis included a novel downgrade attack against the TLS protocol itself called Logjam, which exploits EXPORT cryptography (just like FREAK).

First, let me start by saying that CloudFlare customers are not and were never affected. We don’t support non-EC Diffie-Hellman ciphersuites on either the client or origin side. We also won't touch EXPORT-grade cryptography with a 20ft stick.

But why are CloudFlare customers safe, and how does Logjam work anyway?

Diffie-Hellman and TLS

This is a detailed technical introduction to how DH works and how it’s used in TLS—if you already know this and want to read about the attack, skip to “Enter export crypto, enter Logjam” below. If, instead, you are not interested in the nuts and bolts and want to know who’s at risk, skip to “So, what’s affected?”

To start a TLS connection, the two sides—client (the browser) and server (CloudFlare)—need to agree securely on a secret key. This process is called Key Exchange and it happens during the TLS Handshake: the exchange of messages that take place before encrypted data can be transmitted.

There is a detailed description of the TLS handshake in the first part of this previous blog post by Nick Sullivan. In the following, I’ll only discuss the ideas you’ll need to understand the attack at hand.

There are many types of Key Exchanges: static RSA, Diffie-Hellman (DHE cipher suites), Elliptic Curve Diffie-Hellman (ECDHE cipher suites), and some less used methods.

An important property of DHE and ECDHE key exchanges is that they provide Forward Secrecy. That is, even if the server key is compromised at some point, it can’t be used to decrypt past connections. It’s important to protect the information exchanged from future breakthroughs, and we’re proud to say that 94% of CloudFlare connections provide it.

This research—and this attack—applies to the normal non-EC Diffie-Hellman key exchange (DHE). This is how it works at a high level (don’t worry, I’ll explain each part in more detail below):

The client advertises support for DHE cipher suites when opening a connection (in what is called a Client Hello message)
The server picks the parameters and performs its half of the DH computation using those parameters
The server signs parameters and its DH share with its long-term certificate and sends the whole thing to the client
The client checks the signature, uses the parameters to perform its half of the computation and sends the result to the server
Both parts put everything together and derive a shared secret, which they will use as the key to secure the connection

(For a more in depth analysis of each step see the link to Nick Sullivan’s blog post above, “Ephemeral Diffie-Hellman Handshake” section.)

Let’s explain some of the terms that just passed by your screen. “The client” is the browser and “the server” is the website (or CloudFlare’s edge serving the website).

“The parameters” (or group) are some big numbers that are used as base for the DH computations. They can be, and often are, fixed. The security of the final secret depends on the size of these parameters. This research deemed 512 and 768 bits to be weak, 1024 bits to be breakable by really powerful attackers like governments, and 2048 bits to be a safe size.

The certificate contains a public key and is what you (or CloudFlare for you) get issued from a CA for your website. The client makes sure it’s issued by a CA it trusts and that it’s valid for the visited website. The server uses the corresponding private key to cryptographically sign its share of the DH key exchange so that the client can be sure it’s agreeing on a connection key with the real owner of the website, not a MitM.

Finally, the DH computation: there’s a beautiful explanation of this on Wikipedia which uses paint. The tl;dr is:

The server picks a secret ‘a’
Then it computes—using some parameters as a base—a value ‘A’ from it and sends that to the client (not ‘a’!)
The client picks a secret ‘b’, takes the parameters from the server and likewise it computes a value ‘B’ that sends to the server
Both parts put together ‘a’ + ‘B’ or ‘b’ + ‘A’ to derive a shared, identical secret - which is impossible to compute from ‘A’ + ‘B’ which are the only things that travelled on the wire

The security of all this depends on the strength/size of the parameters.

Enter export crypto, enter Logjam

So far, so good. Diffie-Hellman is nice, it provides Forward Secrecy, it’s secure if the parameters are big enough, and the parameters are picked and signed by the server. So what’s the problem?

Enter “export cryptography”! Export cryptography is a relic of the 90’s US restrictions on cryptography export. In order to support SSL in countries to where the U.S. had disallowed exporting "strong cryptography", many implementations support weakened modes called EXPORT modes.

We’ve already seen an attack that succeeded because connections could be forced to use these modes even if they wouldn’t want to, this is what happened with the FREAK vulnerability. It’s telling that 20 years after these modes became useless we are still dealing with the outcome of the added complexity.

How it works with Diffie-Hellman is that the client requests a DHE_EXPORT ciphersuite instead of the corresponding DHE one. Seeing that, the server (if it supports DHE_EXPORT) picks small, breakable 512-bits parameters for the exchange, and carries on with a regular DHE key exchange. The server doesn’t signal back securely to the client that it picked such small DH parameters because of the EXPORT ciphersuite.

This is the protocol flaw at the heart of Logjam “downgrade attack”:

A MitM attacker intercepts a client connection and replaces all the accepted ciphersuites with only the DHE_EXPORT ones
The server picks weak 512-bits parameters, does its half of the computation, and signs the parameters with the certificate’s private key. Neither the Client Hello, the client ciphersuites, nor the chosen ciphersuite are signed by the server!
The client is led to believe that the server picked a DHE Key Exchange and just willingly decided for small parameters. From its point of view, it has have no way to know that the server was tricked by the MitM into doing so!
The attacker would then break one of the two weak DH shares, recover the connection key, and proceed with the TLS connection with the client

Imperfect Forward Secrecy: How Diffie-Hellman Fails in Practice - Figure 2

The client has no other way to protect itself besides drawing a line in the sand about how weak the DHE parameters can be (e.g. at least 1024 bits) and refuse to connect to servers that want to pick smaller ones. This is what all modern browsers are now doing, but it wasn’t done before because it causes breakage, and it was believed that there was no way to trick a server into choosing such weak parameters if it wouldn’t normally.

The servers can protect themselves by refusing EXPORT ciphersuites and never signing small parameters.

About parameters size and reuse

But how small is too small for DH parameters? The Logjam paper analyzes this in depth also. The first thing to understand is that parameters can be and often are reused. 17.9% of the Top 1 Million Alexa Domains used the same 1024-bit parameters.

An attacker can perform the bulk of the computation having only the parameters, and then break any DH exchange that uses them in minutes. So when many sites (or VPN servers, etc.) share the same parameters, the investment of time needed to “break” the parameters makes much more sense since it would then allow the attacker to break many connections with little extra effort.

The research team performed the precomputation on the most common 512-bit (EXPORT) parameters to demonstrate the impact of Logjam, but they express concerns that real, more powerful attackers might do the same with the common normal-DHE 1024-bit parameters.

Finally, in their Internet-wide scan they discovered that many servers will provide vulnerable 512-bit parameters even for non-EXPORT DHE, in order to support older TLS implementations (for example, old Java versions).

So, what’s affected?

A client/browser is affected if it accepts small DHE parameters as part of any connection, since it has no way to know that it’s being tricked into a weak EXPORT-level connection. Most major browsers at the time of this writing are vulnerable but are moving to restrict the size of DH parameters to 1024 bit. You can check yours visiting weakdh.org.

A server/website is vulnerable if it supports the DHE_EXPORT ciphersuites or if it uses small parameters for DHE. You can find a test and instructions on how to fix this at https://weakdh.org/sysadmin.html. 8.4% of Alexa Top Million HTTPS websites were initially vulnerable (with 82% and 10% of them using the same two parameters sets, making precomputation more viable). CloudFlare servers don’t accept either DHE_EXPORT or DHE. We offer ECDHE instead.

Some interesting related statistics: 94% of the TLS connections to CloudFlare customer sites uses ECDHE (more precisely 90% of them being ECDHE-RSA-AES of some sort and 10% ECDHE-RSA-CHACHA20-POLY1305) and provides Forward Secrecy. The rest use static RSA (5.5% with AES, 0.6% with 3DES).

Both the client and the server need to be vulnerable in order for the attack to succeed because the server must accept to sign small DHE_EXPORT parameters, and the client must accept them as valid DHE parameters.

A closing note: events like this are ultimately a good thing for the security industry and the web at large since they mean that skilled people are looking at what we rely on to secure our connections and fix its flaws. They also put a spotlight on how the added complexity of supporting reduced-strength crypto and older devices endangers and adds difficulty to all of our security efforts.

If you’ve read to here, found it interesting, and would like to work on things like this, remember that we’re hiring in London and San Francisco!

Help us test our DNSSEC implementation

Filippo Valsorda — Thu, 29 Jan 2015 13:03:08 GMT

See our previous post for an introduction to DNSSEC.

Today is a big day for Cloudflare! We are publishing our first DNSSEC signed zone for the community to analyze and give feedback on:

www.cloudflare.com- a fully signed zone managed by Cloudflare

We've been testing our implementation internally for some time with great results, so we now want to know from outside users how it’s working!

Here’s an example of what you should see if you pull the records of, for example, www.cloudflare.com

DiG 9.11.18 <<>> www.cloudflare.com A +dnssec
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57987
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;www.cloudflare.com.		IN	A

;; ANSWER SECTION:
www.cloudflare.com.	294	IN	RRSIG	A 13 3 300 20210114001214 20210111221214 34505 www.cloudflare.com. QZrCZlAC29e5RYjF+Xt9l02bWYhPE9so5EZZHO07oAd1m6x4Ghbt873O t7dipnScuJcdu2zPpvFGAu5f+dtrNg==
www.cloudflare.com.	294	IN	A	104.16.123.96
www.cloudflare.com.	294	IN	A	104.16.124.96

;; Query time: 25 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Tue Jan 12 15:12:17 PST 2021
;; MSG SIZE  rcvd: 193

This is a big step towards our goal of doing with DNSSEC what we did with TLS: making it easy and widespread. We’re working on that and will get there soon.

DNSSEC presents many complexities that we are addressing doing DNSSEC in a modern way: for example by signing on the fly we can prevent NSEC records from revealing all zone’s subdomains; by using ECDSA we make DNS answers smaller and reduce the risk of reflection attacks; and finally by providing a fully managed solution we take away all the complexity from you.

A visualization of the signatures on our domain. Source: DNSViz

So let us know how those two domains load and validate for you. We’ll make sure to get you some stickers if you find some obscure bug!

UPDATE: The beta is full, thanks for all who are participating.

P.S. If you are a DNSSEC enthusiast and you want to be part of the public beta, just send an email to dnssec-beta@cloudflare.com with the name of your website and the answer to this question - first ten people get in:

~~ What is the DNSSEC algorithm number for ECDSAP256SHA256? ~~

The Cloudflare Blog

TLS 1.3 explained by the Cloudflare Crypto Team at 33c3

TLS 1.3 hits Chrome and Firefox Stable

So you want to expose Go on the Internet

crypto/tls

net/http

Timeouts

HTTP/2

TCP Keep-Alives

ServeMux

Logging

Metrics

Conclusion

Dyn issues affecting joint customers

TLS nonce-nse

Nonces in TLS

RC4 and stream ciphers

CBC in TLS 1.0

CBC in TLS 1.1+

TLS 1.2 GCM

TLS 1.3

ChaCha20-Poly1305

To recap

Nonce misuse resistance

An overview of TLS 1.3 and Q&A

Summarized transcript

The complete guide to Go net/http timeouts

SetDeadline

Server Timeouts

http.ListenAndServe is doing it wrong

About streaming

Client Timeouts

Cancel and Context

Yet Another Padding Oracle in OpenSSL CBC Ciphersuites

TLS, CBC, and Mac-then-Encrypt

Constant time programming crash course

Off by 20

From there to decryption

But no further

Testing for the vulnerability

Conclusion

Building the simplest Go static analysis tool

First, loading the program

Then, the actual logic

Further reading

Go coverage with external tests

Creative foot-shooting with Go RWMutex

DNS parser, meet Go fuzzer

Hot Fuzz

Not just crashes and hangs

A deep look at CVE-2015-5477 and how CloudFlare Virtual DNS customers are protected

Virtual DNS customers have always been protected

Quick and dirty annotations for Go stack traces

Underscore to the rescue

Setting Go variables from the outside

Go has a debugger—and it's awesome!

About source rewriting

Logjam: the latest TLS vulnerability explained

Diffie-Hellman and TLS

Enter export crypto, enter Logjam

About parameters size and reuse

So, what’s affected?

Help us test our DNSSEC implementation

`crypto/tls`

`net/http`