
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/">
    <channel>
        <title><![CDATA[ The Cloudflare Blog ]]></title>
        <description><![CDATA[ Get the latest news on how products at Cloudflare are built, technologies used, and join the teams helping to build a better Internet. ]]></description>
        <link>https://blog.cloudflare.com</link>
        <atom:link href="https://blog.cloudflare.com/" rel="self" type="application/rss+xml"/>
        <language>en-us</language>
        <image>
            <url>https://blog.cloudflare.com/favicon.png</url>
            <title>The Cloudflare Blog</title>
            <link>https://blog.cloudflare.com</link>
        </image>
        <lastBuildDate>Sun, 05 Apr 2026 19:12:27 GMT</lastBuildDate>
        <item>
            <title><![CDATA[TLS 1.3 explained by the Cloudflare Crypto Team at 33c3]]></title>
            <link>https://blog.cloudflare.com/tls-1-3-explained-by-the-cloudflare-crypto-team-at-33c3/</link>
            <pubDate>Wed, 01 Feb 2017 14:57:00 GMT</pubDate>
            <description><![CDATA[ Nick Sullivan and I gave a talk about TLS 1.3 at 33c3, the latest Chaos Communication Congress. The congress, attended by more that 13,000 hackers in Hamburg, has been one of the hallmark events of the security community for more than 30 years. ]]></description>
            <content:encoded><![CDATA[ <p><a href="/author/nick-sullivan/">Nick Sullivan</a> and I gave a talk about <a href="/tag/tls%201.3/">TLS 1.3</a> at <a href="https://events.ccc.de/tag/33c3/">33c3</a>, the latest Chaos Communication Congress. The congress, attended by more that 13,000 hackers in Hamburg, has been one of the hallmark events of the security community for more than 30 years.</p><p>You can watch the recording below, or <a href="https://media.ccc.de/v/33c3-8348-deploying_tls_1_3_the_great_the_good_and_the_bad">download it in multiple formats and languages on the CCC website</a>.</p><p>The talk introduces TLS 1.3 and explains how it works in technical detail, why it is faster and more secure, and touches on its history and current status.</p><p>.fluid-width-video-wrapper { margin-bottom: 45px; }</p><p>The <a href="https://speakerdeck.com/filosottile/tls-1-dot-3-at-33c3">slide deck is also online</a>.</p><p>This was an expanded and updated version of the <a href="/tls-1-3-overview-and-q-and-a/">internal talk previously transcribed on this blog</a>.</p>
    <div>
      <h3>TLS 1.3 hits Chrome and Firefox Stable</h3>
      <a href="#tls-1-3-hits-chrome-and-firefox-stable">
        
      </a>
    </div>
    <p>In related news, TLS 1.3 is reaching a percentage of Chrome and Firefox users this week, so websites with the Cloudflare TLS 1.3 beta enabled will load faster and more securely for all those new users.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/lIyLFsHXlAipFcgZ1nPWr/e71e81c8a7849214051b75430e1c169e/Screen-Shot-2017-01-30-at-20.14.53.png" />
            
            </figure><p>You can enable the TLS 1.3 beta from the Crypto section of your control panel.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7jji24riIIZQ2OEC6Xc93r/88d0ae02211b14fd407c065c5880ad31/image00.png" />
            
            </figure> ]]></content:encoded>
            <category><![CDATA[TLS 1.3]]></category>
            <category><![CDATA[TLS]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Chrome]]></category>
            <category><![CDATA[Firefox]]></category>
            <category><![CDATA[Beta]]></category>
            <category><![CDATA[Cryptography]]></category>
            <guid isPermaLink="false">2zdHVDhrFKGUtMgVjYallG</guid>
            <dc:creator>Filippo Valsorda</dc:creator>
        </item>
        <item>
            <title><![CDATA[So you want to expose Go on the Internet]]></title>
            <link>https://blog.cloudflare.com/exposing-go-on-the-internet/</link>
            <pubDate>Mon, 26 Dec 2016 14:59:01 GMT</pubDate>
            <description><![CDATA[ Back when crypto/tls was slow and net/http young, the general wisdom was to always put Go servers behind a reverse proxy like NGINX. That's not necessary anymore! ]]></description>
            <content:encoded><![CDATA[ <p><i>This piece was </i><a href="https://blog.gopheracademy.com/advent-2016/exposing-go-on-the-internet/"><i>originally written</i></a><i> for the Gopher Academy advent series. We are grateful to them for allowing us to republish it here.</i></p><p>Back when <code>crypto/tls</code> was slow and <code>net/http</code> young, the general wisdom was to always put Go servers behind a reverse proxy like NGINX. That's not necessary anymore!</p><p>At Cloudflare we recently experimented with exposing pure Go services to the hostile wide area network. With the Go 1.8 release, <code>net/http</code> and <code>crypto/tls</code> proved to be stable, performant and flexible.</p><p>However, the defaults are tuned for local services. In this articles we'll see how to tune and harden a Go server for Internet exposure.</p>
    <div>
      <h3><code>crypto/tls</code></h3>
      <a href="#crypto-tls">
        
      </a>
    </div>
    <p>You're not running an insecure HTTP server on the Internet in 2016. So you need <code>crypto/tls</code>. The good news is that it's <a href="/go-crypto-bridging-the-performance-gap/">now really fast</a> (as you've seen in a <a href="https://blog.gopheracademy.com/advent-2016/tls-termination-bench/">previous advent article</a>), and its security track record so far is excellent.</p><p>The default settings resemble the <i>Intermediate</i> recommended configuration of the <a href="https://wiki.mozilla.org/Security/Server_Side_TLS">Mozilla guidelines</a>. However, you should still set <code>PreferServerCipherSuites</code> to ensure safer and faster cipher suites are preferred, and <code>CurvePreferences</code> to avoid unoptimized curves: a client using <code>CurveP384</code> would cause up to a second of CPU to be consumed on our machines.</p>
            <pre><code>&amp;tls.Config{
	// Causes servers to use Go's default ciphersuite preferences,
	// which are tuned to avoid attacks. Does nothing on clients.
	PreferServerCipherSuites: true,
	// Only use curves which have assembly implementations
	CurvePreferences: []tls.CurveID{
		tls.CurveP256,
		tls.X25519, // Go 1.8 only
	},
}</code></pre>
            <p>If you can take the compatibility loss of the <i>Modern</i> configuration, you should then also set <code>MinVersion</code> and <code>CipherSuites</code>.</p>
            <pre><code>	MinVersion: tls.VersionTLS12,
	CipherSuites: []uint16{
		tls.TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,
		tls.TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
		tls.TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305, // Go 1.8 only
		tls.TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,   // Go 1.8 only
		tls.TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
		tls.TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,

		// Best disabled, as they don't provide Forward Secrecy,
		// but might be necessary for some clients
		// tls.TLS_RSA_WITH_AES_256_GCM_SHA384,
		// tls.TLS_RSA_WITH_AES_128_GCM_SHA256,
	},</code></pre>
            <p>Be aware that the Go implementation of the CBC cipher suites (the ones we disabled in <i>Modern</i> mode above) is vulnerable to the <a href="https://www.imperialviolet.org/2013/02/04/luckythirteen.html">Lucky13 attack</a>, even if <a href="https://github.com/golang/go/commit/f28cf8346c4ce7cb74bf97c7c69da21c43a78034">partial countermeasures were merged in 1.8</a>.</p><p>Final caveat, all these recommendations apply only to the amd64 architecture, for which <a href="/go-crypto-bridging-the-performance-gap/">fast, constant time implementations</a> of the crypto primitives (AES-GCM, ChaCha20-Poly1305, P256) are available. Other architectures are probably not fit for production use.</p><p>Since this server will be exposed to the Internet, it will need a publicly trusted certificate. You can get one easily and for free thanks to Let's Encrypt and the <a href="https://godoc.org/golang.org/x/crypto/acme/autocert"><code>golang.org/x/crypto/acme/autocert</code></a> package’s <code>GetCertificate</code> function.</p><p>Don't forget to redirect HTTP page loads to HTTPS, and consider <a href="https://www.owasp.org/index.php/HTTP_Strict_Transport_Security_Cheat_Sheet">HSTS</a> if your clients are browsers.</p>
            <pre><code>srv := &amp;http.Server{
	ReadTimeout:  5 * time.Second,
	WriteTimeout: 5 * time.Second,
	Handler: http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
		w.Header().Set("Connection", "close")
		url := "https://" + req.Host + req.URL.String()
		http.Redirect(w, req, url, http.StatusMovedPermanently)
	}),
}
go func() { log.Fatal(srv.ListenAndServe()) }()</code></pre>
            <p>You can use the <a href="https://www.ssllabs.com/ssltest/">SSL Labs test</a> to check that everything is configured correctly.</p>
    <div>
      <h3><code>net/http</code></h3>
      <a href="#net-http">
        
      </a>
    </div>
    <p><code>net/http</code> is a mature HTTP/1.1 and HTTP/2 stack. You probably know how (and have opinions about how) to use the Handler side of it, so that's not what we'll talk about. We will instead talk about the Server side and what goes on behind the scenes.</p>
    <div>
      <h3>Timeouts</h3>
      <a href="#timeouts">
        
      </a>
    </div>
    <p>Timeouts are possibly the most dangerous edge case to overlook. Your service might get away with it on a controlled network, but it will not survive on the open Internet, especially (but not only) if maliciously attacked.</p><p>Applying timeouts is a matter of resource control. Even if goroutines are cheap, file descriptors are always limited. A connection that is stuck, not making progress or is maliciously stalling should not be allowed to consume them.</p><p>A server that ran out of file descriptors will fail to accept new connections with errors like</p>
            <pre><code>http: Accept error: accept tcp [::]:80: accept: too many open files; retrying in 1s</code></pre>
            <p>A zero/default <code>http.Server</code>, like the one used by the package-level helpers <code>http.ListenAndServe</code> and <code>http.ListenAndServeTLS</code>, comes with no timeouts. You don't want that.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/12k30eO5eG5vdJG4C7Yk6J/89cc5b4dc27bff41da1259bde71b278d/Timeouts.png" />
            
            </figure><p>There are three main timeouts exposed in <code>http.Server</code>: <code>ReadTimeout</code>, <code>WriteTimeout</code> and <code>IdleTimeout</code>. You set them by explicitly using a Server:</p>
            <pre><code>srv := &amp;http.Server{
    ReadTimeout:  5 * time.Second,
    WriteTimeout: 10 * time.Second,
    IdleTimeout:  120 * time.Second,
    TLSConfig:    tlsConfig,
    Handler:      serveMux,
}
log.Println(srv.ListenAndServeTLS("", ""))</code></pre>
            <p><code>ReadTimeout</code> covers the time from when the connection is accepted to when the request body is fully read (if you do read the body, otherwise to the end of the headers). It's implemented in <code>net/http</code> by calling <code>SetReadDeadline</code> <a href="https://github.com/golang/go/blob/3ba31558d1bca8ae6d2f03209b4cae55381175b3/src/net/http/server.go#L750">immediately after Accept</a>.</p><p>The problem with a <code>ReadTimeout</code> is that it doesn't allow a server to give the client more time to stream the body of a request based on the path or the content. Go 1.8 introduces <code>ReadHeaderTimeout</code>, which only covers up to the request headers. However, there's still no clear way to do reads with timeouts from a Handler. Different designs are being discussed in issue <a href="https://golang.org/issue/16100">#16100</a>.</p><p><code>WriteTimeout</code> normally covers the time from the end of the request header read to the end of the response write (a.k.a. the lifetime of the ServeHTTP), by calling <code>SetWriteDeadline</code> <a href="https://github.com/golang/go/blob/3ba31558d1bca8ae6d2f03209b4cae55381175b3/src/net/http/server.go#L753-L755">at the end of readRequest</a>.</p><p>However, when the connection is over HTTPS, <code>SetWriteDeadline</code> is called <a href="https://github.com/golang/go/blob/3ba31558d1bca8ae6d2f03209b4cae55381175b3/src/net/http/server.go#L1477-L1483">immediately after Accept</a> so that it also covers the packets written as part of the TLS handshake. Annoyingly, this means that (in that case only) <code>WriteTimeout</code> ends up including the header read and the first byte wait.</p><p>Similarly to <code>ReadTimeout</code>, <code>WriteTimeout</code> is absolute, with no way to manipulate it from a Handler (<a href="https://golang.org/issue/16100">#16100</a>).</p><p>Finally, Go 1.8 <a href="https://github.com/golang/go/issues/14204">introduces <code>IdleTimeout</code></a> which limits server-side the amount of time a Keep-Alive connection will be kept idle before being reused. Before Go 1.8, the <code>ReadTimeout</code> would start ticking again immediately after a request completed, making it very hostile to Keep-Alive connections: the idle time would consume time the client should have been allowed to send the request, causing unexpected timeouts also for fast clients.</p><p>You should set <code>Read</code>, <code>Write</code> and <code>Idle</code> timeouts when dealing with untrusted clients and/or networks, so that a client can't hold up a connection by being slow to write or read.</p><p>For detailed background on HTTP/1.1 timeouts (up to Go 1.7) read <a href="/the-complete-guide-to-golang-net-http-timeouts/">my post on the Cloudflare blog</a>.</p>
    <div>
      <h4>HTTP/2</h4>
      <a href="#http-2">
        
      </a>
    </div>
    <p>HTTP/2 is enabled automatically on any Go 1.6+ server if:</p><ul><li><p>the request is served over TLS/HTTPS</p></li><li><p><code>Server.TLSNextProto</code> is <code>nil</code> (setting it to an empty map is how you disable HTTP/2)</p></li><li><p><code>Server.TLSConfig</code> is set and <code>ListenAndServeTLS</code> is used <b>or</b></p></li><li><p><code>Serve</code> is used and <code>tls.Config.NextProtos</code> includes <code>"h2"</code> (like <code>[]string{"h2", "http/1.1"}</code>, since <code>Serve</code> is called <a href="https://github.com/golang/go/issues/15908">too late to auto-modify the TLS Config</a>)</p></li></ul><p>HTTP/2 has a slightly different meaning since the same connection can be serving different requests at the same time, however, they are abstracted to the same set of Server timeouts in Go.</p><p>Sadly, <code>ReadTimeout</code> breaks HTTP/2 connections in Go 1.7. Instead of being reset for each request it's set once at the beginning of the connection and never reset, breaking all HTTP/2 connections after the <code>ReadTimeout</code> duration. <a href="https://github.com/golang/go/issues/16450">It's fixed in 1.8</a>.</p><p>Between this and the inclusion of idle time in <code>ReadTimeout</code>, my recommendation is to <b>upgrade to 1.8 as soon as possible</b>.</p>
    <div>
      <h4>TCP Keep-Alives</h4>
      <a href="#tcp-keep-alives">
        
      </a>
    </div>
    <p>If you use <code>ListenAndServe</code> (as opposed to passing a <code>net.Listener</code> to <code>Serve</code>, which offers zero protection by default) a TCP Keep-Alive period of three minutes <a href="https://github.com/golang/go/blob/61db2e4efa2a8f558fd3557958d1c86dbbe7d3cc/src/net/http/server.go#L3023-L3039">will be set automatically</a>. That <i>will</i> help with clients that disappear completely off the face of the earth leaving a connection open forever, but I’ve learned not to trust that, and to set timeouts anyway.</p><p>To begin with, three minutes might be too high, which you can solve by implementing your own <a href="https://github.com/golang/go/blob/61db2e4efa2a8f558fd3557958d1c86dbbe7d3cc/src/net/http/server.go#L3023-L3039"><code>tcpKeepAliveListener</code></a>.</p><p>More importantly, a Keep-Alive only makes sure that the client is still responding, but does not place an upper limit on how long the connection can be held. A single malicious client can just open as many connections as your server has file descriptors, hold them half-way through the headers, respond to the rare keep-alives, and effectively take down your service.</p><p>Finally, in my experience connections tend to leak anyway until <a href="https://github.com/FiloSottile/Heartbleed/commit/4a3332ca1dc07aedf24b8540857792f72624cdf7">timeouts are in place</a>.</p>
    <div>
      <h3>ServeMux</h3>
      <a href="#servemux">
        
      </a>
    </div>
    <p>Package level functions like <code>http.Handle[Func]</code> (and maybe your web framework) register handlers on the global <code>http.DefaultServeMux</code> which is used if <code>Server.Handler</code> is nil. You should avoid that.</p><p>Any package you import, directly or through other dependencies, has access to <code>http.DefaultServeMux</code> and might register routes you don't expect.</p><p>For example, if any package somewhere in the tree imports <code>net/http/pprof</code> clients will be able to get CPU profiles for your application. You can still use <code>net/http/pprof</code> by registering <a href="https://github.com/golang/go/blob/1106512db54fc2736c7a9a67dd553fc9e1fca742/src/net/http/pprof/pprof.go#L67-L71">its handlers</a> manually.</p><p>Instead, instantiate an <code>http.ServeMux</code> yourself, register handlers on it, and set it as <code>Server.Handler</code>. Or set whatever your web framework exposes as <code>Server.Handler</code>.</p>
    <div>
      <h3>Logging</h3>
      <a href="#logging">
        
      </a>
    </div>
    <p><code>net/http</code> does a number of things before yielding control to your handlers: <a href="https://github.com/golang/go/blob/1106512db54fc2736c7a9a67dd553fc9e1fca742/src/net/http/server.go#L2631-L2653"><code>Accept</code>s the connections</a>, <a href="https://github.com/golang/go/blob/1106512db54fc2736c7a9a67dd553fc9e1fca742/src/net/http/server.go#L1718-L1728">runs the TLS Handshake</a>, ...</p><p>If any of these go wrong a line is written directly to <code>Server.ErrorLog</code>. Some of these, like timeouts and connection resets, are expected on the open Internet. It's not clean, but you can intercept most of those and turn them into metrics by matching them with regexes from the Logger Writer, thanks to this guarantee:</p><blockquote><p>Each logging operation makes a single call to the Writer's Write method.</p></blockquote><p>To abort from inside a Handler without logging a stack trace you can either <code>panic(nil)</code> or in Go 1.8 <code>panic(http.ErrAbortHandler)</code>.</p>
    <div>
      <h3>Metrics</h3>
      <a href="#metrics">
        
      </a>
    </div>
    <p>A metric you'll want to monitor is the number of open file descriptors. <a href="https://github.com/prometheus/client_golang/blob/575f371f7862609249a1be4c9145f429fe065e32/prometheus/process_collector.go">Prometheus does that by using the <code>proc</code> filesystem</a>.</p><p>If you need to investigate a leak, you can use the <code>Server.ConnState</code> hook to get more detailed metrics of what stage the connections are in. However, note that there is no way to keep a correct count of <code>StateActive</code> connections without keeping state, so you'll need to maintain a <code>map[net.Conn]ConnState</code>.</p>
    <div>
      <h3>Conclusion</h3>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>The days of needing NGINX in front of all Go services are gone, but you still need to take a few precautions on the open Internet, and probably want to upgrade to the shiny, new Go 1.8.</p><p>Happy serving!</p> ]]></content:encoded>
            <category><![CDATA[TLS]]></category>
            <category><![CDATA[HTTP2]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Programming]]></category>
            <category><![CDATA[Cryptography]]></category>
            <guid isPermaLink="false">5x4k8xtfrkPyeHS7HcYbF7</guid>
            <dc:creator>Filippo Valsorda</dc:creator>
        </item>
        <item>
            <title><![CDATA[Dyn issues affecting joint customers]]></title>
            <link>https://blog.cloudflare.com/dyn-issues-affecting-joint-customers/</link>
            <pubDate>Fri, 21 Oct 2016 18:22:55 GMT</pubDate>
            <description><![CDATA[ Today there is an ongoing, large scale Denial-of-Service attack directed against Dyn DNS. While Cloudflare services are operating normally, if you are using both Cloudflare and Dyn services, your website may be affected. ]]></description>
            <content:encoded><![CDATA[ <p>Today there is an ongoing, large scale Denial-of-Service attack directed against Dyn DNS. While Cloudflare services are operating normally, if you are using both Cloudflare and Dyn services, your website may be affected.</p><p>Specifically, if you are using CNAME records which point to a zone hosted on Dyn, our DNS queries directed to Dyn might fail making your website unavailable, and presenting a “1001” error message.</p><p>Some popular services that might rely on Dyn for part of their operations include GitHub Pages, Heroku, Shopify and AWS.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/8CBsEx31WrHrxQK8sV9fh/5590eaa0c9628a5a4eda472a1fd78ad6/Screen-Shot-2016-10-21-at-19.08.14.png" />
            
            </figure><p>As a possible workaround, you might be able to update your Cloudflare DNS records from CNAMEs (referring to Dyn hosted records) to A/AAAA records specifying the origin IP of your website. This will allow Cloudflare to reach your origin without the need for an external DNS lookup.</p><p>Note that if you use different origin IP addresses, for example based on the geographical location, you may lose some of that functionality by using plain A/AAAA records. We recommend that you provide addresses for many of your different locations, so that load will be shared amongst them.</p><p>Customers with a CNAME setup (which means Cloudflare is not configured in your domain NS records) where the main zone is hosted on a Dyn service will be affected as well. You might be able to make Cloudflare your authoritative DNS provider by contacting support and asking to be changed to Full mode and then updating your nameservers at your <a href="https://www.cloudflare.com/learning/dns/glossary/what-is-a-domain-name-registrar/">registrar</a>, but the change can take up to 48h to propagate.</p><p>Please note that the Cloudflare status page and support system might be affected by the ongoing attack, since they are hosted on third parties, as per industry best practices.</p> ]]></content:encoded>
            <category><![CDATA[DDoS]]></category>
            <category><![CDATA[Attacks]]></category>
            <category><![CDATA[Outage]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Security]]></category>
            <guid isPermaLink="false">1eSNR5scDOqUvCjp3j813W</guid>
            <dc:creator>Filippo Valsorda</dc:creator>
        </item>
        <item>
            <title><![CDATA[TLS nonce-nse]]></title>
            <link>https://blog.cloudflare.com/tls-nonce-nse/</link>
            <pubDate>Wed, 12 Oct 2016 15:05:00 GMT</pubDate>
            <description><![CDATA[ One of the base principles of cryptography is that you can't just encrypt multiple messages with the same key. At the very least, what will happen is that two messages that have identical plaintext will also have identical ciphertext, which is a dangerous leak.  ]]></description>
            <content:encoded><![CDATA[ <p>One of the base principles of cryptography is that you can't <i>just</i> encrypt multiple messages with the same key. At the very least, what will happen is that two messages that have identical plaintext will also have identical ciphertext, which is a dangerous leak. (This is similar to why you can't encrypt blocks with <a href="https://blog.filippo.io/the-ecb-penguin/">ECB</a>.)</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7e62kSNRA1U4fnYOyOzcnf/f7170ef03a53336e04f0ce640fe779fe/19fq1n.jpg" />
            
            </figure><p>If you think about it, a pure encryption function is just like any other pure computer function: deterministic. Given the same set of inputs (key and message) it will always return the same output (the encrypted message). And we don't want an attacker to be able to tell that two encrypted messages came from the same plaintext.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2ypDxOdFtrCyJKATmJ8MrZ/6993281f817e294e4c93aded559575b4/Nonces.001-1.png" />
            
            </figure><p>The solution is the use of IVs (Initialization Vectors) or nonces (numbers used once). These are byte strings that are different for each encrypted message. They are the source of non-determinism that is needed to make duplicates indistinguishable. They are usually not secret, and distributed prepended to the ciphertext since they are necessary for decryption.</p><p>The distinction between IVs and nonces is controversial and not binary. Different encryption schemes require different properties to be secure: some just need them to never repeat, in which case we commonly refer to them as nonces; some also need them to be random, or even unpredictable, in which case we commonly call them IVs.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3zLMUUtK3IC6m9cEqDEeN8/b1e3f972e89120f0251a86eb9b9043dd/Nonces.002-1.png" />
            
            </figure>
    <div>
      <h3>Nonces in TLS</h3>
      <a href="#nonces-in-tls">
        
      </a>
    </div>
    <p>TLS at its core is about encrypting a stream of packets, or more properly "records". The initial handshake takes care of authenticating the connection and generating the keys, but then it's up to the record layer to encrypt many records with that same key. Enter nonces.</p><p>Nonce management can be a hard problem, but TLS is near to the best case: keys are never reused across connections, and the records have sequence numbers that both sides keep track of. However, it took the protocol a few revisions to fully take advantage of this.</p><p>The resulting landscape is a bit confusing (including one or two attack names):</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6SKHBZfapSldYN2czxNNuH/7f2d30dd41d885dc13b76d9ac6668c35/Nonces-table.png" />
            
            </figure>
    <div>
      <h4>RC4 and stream ciphers</h4>
      <a href="#rc4-and-stream-ciphers">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/VnXTZTwdpRaoyqySKVcDn/ae6e025045313ccfc7da498555196386/Nonces-RC4-black.png" />
            
            </figure><p>RC4 is a stream cipher, so it doesn't have to treat records separately. The cipher generates a continuous keystream which is XOR'd with the plaintexts as if they were just portions of one big message. Hence, there are no nonces.</p><p>RC4 <a href="/tag/rc4/">is broken</a> and was removed from TLS 1.3.</p>
    <div>
      <h4>CBC in TLS 1.0</h4>
      <a href="#cbc-in-tls-1-0">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4qMGKsTc0nc47WCKPrCvjk/f5dbe96ec072da2153b4549c5094d454/Nonces-CBC-1.0-black-1.png" />
            
            </figure><p>CBC in TLS 1.0 works similarly to RC4: the cipher is instantiated once, and then the records are encrypted as part of one continuous message.</p><p>Sadly that means that the IV for the next record <a href="https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Cipher_Block_Chaining_.28CBC.29">is the last block of ciphertext of the previous record</a>, which the attacker can observe. Being able to predict the IV breaks CBC security, and that led to the <a href="https://www.imperialviolet.org/2011/09/23/chromeandbeast.html">BEAST attack</a>. BEAST is mitigated by <a href="https://www.imperialviolet.org/2012/01/15/beastfollowup.html">splitting records in two</a>, which effectively randomizes the IV, but this is a client-side fix, out of the server control.</p>
    <div>
      <h4>CBC in TLS 1.1+</h4>
      <a href="#cbc-in-tls-1-1">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5OaNhFttOtRZjYfEgeIq6v/e7a957af57c1116317a13befa7d5f6b6/Nonces-CBC-explicit-black.png" />
            
            </figure><p>TLS 1.1 fixed BEAST by simply making IVs explicit, sending the IV with each record (with the network overhead that comes with that).</p><p>AES-CBC IVs are 16 bytes (128 bits), so using random bytes is sufficient to prevent collisions.</p><p>CBC has <a href="/yet-another-padding-oracle-in-openssl-cbc-ciphersuites/">other nasty design issues</a> and has been removed in TLS 1.3.</p>
    <div>
      <h4>TLS 1.2 GCM</h4>
      <a href="#tls-1-2-gcm">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3bbl3iFzHYooGKXgbGZYCo/389f3e082c1097f97933391d8c7ed9fd/Nonces-GCM-black-2.png" />
            
            </figure><p>TLS 1.2 inherited the 1.1 explicit IVs. It also introduced <a href="/it-takes-two-to-chacha-poly/">AEADs</a> like AES-GCM. The record nonce in 1.2 AES-GCM is a concatenation of a fixed per-connection IV (4 bytes, derived at the same time as the key) and an explicit per-record nonce (8 bytes, sent on the wire).</p><p>Since <a href="https://en.wikipedia.org/wiki/Birthday_problem">8 random bytes is too short to guarantee uniqueness</a>, 1.2 GCM implementations have to use the sequence number or a counter. If you are thinking "but what sense does it make to use an explicit IV, sent on the wire, which is just the sequence number that both parties know anyway", well... yeah.</p><p>Implementations not using a counter/sequence-based AES-GCM nonce were found to be indeed vulnerable by the "<a href="https://github.com/nonce-disrespect/nonce-disrespect">Nonce-Disrespecting Adversaries</a>" paper.</p>
    <div>
      <h4>TLS 1.3</h4>
      <a href="#tls-1-3">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6X6R46MkjuADDzdspGpNAp/313ba89f66a1a304c169d1e9cd801eaf/Nonces-1.3-black-1.png" />
            
            </figure><p>TLS 1.3 finally took advantage of the sequential nature of TLS records and removed the free-form explicit IVs. It uses instead a combination of a fixed per-connection IV (derived at the same time as the key) and the sequence number, XORed—not concatenated.</p><p>This way the entire nonce length is random-looking, nonces can never be reused as the sequence number monotonically increases, and there is no network overhead.</p>
    <div>
      <h4>ChaCha20-Poly1305</h4>
      <a href="#chacha20-poly1305">
        
      </a>
    </div>
    <p>The <a href="/do-the-chacha-better-mobile-performance-with-cryptography/">ChaCha20-Poly1305 ciphersuite</a> uses the same "fixed IV XORed with the sequence number" scheme of TLS 1.3 even when used in TLS 1.2</p><p>While 1.3 AEADs and 1.2 ChaCha20 use the same nonce scheme, when used in 1.2 ChaCha20 still puts the sequence number, type, version and length in the additional authenticated data. 1.3 makes all those either implicit or part of the encrypted payload.</p>
    <div>
      <h3>To recap</h3>
      <a href="#to-recap">
        
      </a>
    </div>
    <ul><li><p>RC4 is a stream cipher, so it has no per-record nonce.</p></li><li><p>CBC in TLS 1.0 used to work similarly to RC4. Sadly, that was vulnerable to BEAST.</p></li><li><p>TLS 1.1 fixed BEAST by simply making IVs explicit and random.</p></li><li><p>TLS 1.2 AES-GCM uses a concatenation of a fixed IV and an explicit sequential nonce.</p></li><li><p>TLS 1.3 finally uses a simple fixed IV XORed with the sequence number.</p></li><li><p>ChaCha20-Poly1305 uses the same scheme of TLS 1.3 even when used in TLS 1.2.</p></li></ul>
    <div>
      <h2>Nonce misuse resistance</h2>
      <a href="#nonce-misuse-resistance">
        
      </a>
    </div>
    <p>In the introduction we used the case of a pair of identical message and key to illustrate the most intuitive issue of missing or reused nonces. However, depending on the cipher, other things can go wrong when the same nonce is reused, or is predictable.</p><p>A repeated nonce often breaks entirely the security properties of the connection. For example, AES-GCM <a href="https://github.com/nonce-disrespect/nonce-disrespect">leaks the authentication key altogether</a>, allowing an attacker to fake packets and inject data.</p><p>As part of the trend of making cryptography primitives less dangerous to use for implementers, the research is focusing on mitigating the adverse consequences of nonce reuse. The property of these new schemes is called <a href="https://www.lvh.io/posts/nonce-misuse-resistance-101.html">Nonce Reuse Resistance</a>.</p><p>However, they still have to see wider adoption and standardization, which is why a solid protocol design like the one in TLS 1.3 is critical to prevent this class of attacks.</p><p><i>Does painting overviews of technical topics like this sound satisfying to you? </i><a href="https://www.cloudflare.com/join-our-team/"><i>We are hiring in London, Austin (TX), Champaign (IL), San Francisco and Singapore</i></a><i>!</i></p> ]]></content:encoded>
            <category><![CDATA[TLS]]></category>
            <category><![CDATA[TLS 1.3]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Encryption]]></category>
            <category><![CDATA[Cryptography]]></category>
            <guid isPermaLink="false">1xqvUPkstNMoVVXMQ3fy5C</guid>
            <dc:creator>Filippo Valsorda</dc:creator>
        </item>
        <item>
            <title><![CDATA[An overview of TLS 1.3 and Q&A]]></title>
            <link>https://blog.cloudflare.com/tls-1-3-overview-and-q-and-a/</link>
            <pubDate>Fri, 23 Sep 2016 16:01:18 GMT</pubDate>
            <description><![CDATA[ The CloudFlare London office hosts weekly internal Tech Talks (with free lunch picked by the speaker). My recent one was an explanation of the latest version of TLS, 1.3, how it works and why it's faster and safer. ]]></description>
            <content:encoded><![CDATA[ <p>The CloudFlare London office hosts weekly internal Tech Talks (with free lunch picked by the speaker). My recent one was an explanation of the latest version of <a href="https://www.cloudflare.com/ssl/">TLS, 1.3</a>, how it works and why it's faster and safer.</p><p>You can <a href="https://vimeo.com/177333631">watch the complete talk</a> below or just read my summarized transcript.</p><p><i>Update: you might want to watch my more recent and extended </i><a href="/tls-1-3-explained-by-the-cloudflare-crypto-team-at-33c3/"><i>33c3 talk</i></a><i> instead.</i></p><p><b>The Q&amp;A session is open!</b> Send us your questions about TLS 1.3 at <a>tls13@cloudflare.com</a> or leave them in the Disqus comments below and I'll answer them in an upcoming blog post.</p><p>.post-content iframe { margin: 0; }</p>
    <div>
      <h4>Summarized transcript</h4>
      <a href="#summarized-transcript">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7Bq6YFjnvg3SB2kKRjfNsr/4be9ff4c51df8eb9b9dbd9eb6f499f13/TLS-1.3.003.png" />
            
            </figure><p>To understand why TLS 1.3 is awesome, we need to take a step back and look at how TLS 1.2 works. In particular we will look at modern TLS 1.2, the kind that a recent browser would use when connecting to the CloudFlare edge.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/8ohjI9yrzu2eEm0U6iOu9/654853ffc8128cef9c4f17ac6dfb28cb/TLS-1.3.004.png" />
            
            </figure><p>The client starts by sending a message called the <code>ClientHello</code> that essentially says "hey, I want to speak TLS 1.2, with one of these cipher suites".</p><p>The server receives that and answers with a <code>ServerHello</code> that says "sure, let's speak TLS 1.2, and I pick <i>this</i> cipher suite".</p><p>Along with that the server sends its <i>key share</i>. The specifics of this key share change based on what cipher suite was selected. When using ECDHE, key shares are mixed with the <a href="/a-relatively-easy-to-understand-primer-on-elliptic-curve-cryptography/">Elliptic Curve Diffie Hellman</a> algorithm.</p><p>The important part to understand is that for the client and server to agree on a cryptographic key, they need to receive each other's portion, or share.</p><p>Finally, the server sends the website certificate (signed by the CA) and a signature on portions of <code>ClientHello</code> and <code>ServerHello</code>, including the key share, so that the client knows that those are authentic.</p><p>The client receives all that, and <i>then</i> generates its own key share, mixes it with the server key share, and thus generates the encryption keys for the session.</p><p>Finally, the client sends the server its key share, enables encryption and sends a <code>Finished</code> message (which is a hash of a transcript of what happened so far). The server does the same: it mixes the key shares to get the key and sends its own <code>Finished</code> message.</p><p>At that point we are done, and we can finally send useful data encrypted on the connection.</p><p>Notice that this takes two round-trips between the client and the server before the HTTP request can be transferred. And <a href="https://www.cloudflare.com/learning/cdn/glossary/round-trip-time-rtt/">round-trips on the Internet</a> can be slow.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3d2jAu0W5lMg2PQm8zrdtz/221af7783bc17706e2d3a2587ecf1312/TLS-1.3.006.png" />
            
            </figure><p>Enter TLS 1.3. While TLS 1.0, 1.1 and 1.2 are not that different, 1.3 is a big jump.</p><p>Most importantly, establishing a TLS 1.3 connection takes <b>one less round-trip</b>.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1hr3R0qLxh2XIklK8lNJdh/8b4cbbea91b113ba182294c1bef64695/TLS-1.3.007.png" />
            
            </figure><p>In TLS 1.3 a client starts by sending not only the <code>ClientHello</code> and the list of supported ciphers, but it also makes a guess as to which key agreement algorithm the server will choose, and <b>sends a key share for that</b>.</p><p>(<i>Note: the video calls the key agreement algorithm "cipher suite". In the meantime the specification has been changed to disjoin supported cipher suites like AES-GCM-SHA256 and supported key agreements like ECDHE P-256.</i>)</p><p>And that saves us a round trip, because as soon as the server selects the cipher suite and key agreement algorithm, it's ready to generate the key, as it already has the client key share. So it can switch to encrypted packets one whole round-trip in advance.</p><p>So the server sends the <code>ServerHello</code>, its key share, the certificate (now encrypted, since it has a key!), and already the <code>Finished</code> message.</p><p>The client receives all that, generates the keys using the key share, checks the certificate and <code>Finished</code>, and it's immediately ready to send the HTTP request, after only one round-trip. Which can be hundreds of milliseconds.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/5Uvkj3F2TDTPupYTuJOCdE/03d16a27196bb78a5cafebecf12e9b34/TLS-1.3.009.png" />
            
            </figure><p>One existing way to speed up TLS connections is called resumption. It's what happens when the client has connected to that server before, and uses what they remember from the last time to cut short the handshake.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7Kb8uoWeYCfHtOsR4jKTig/4e5ea4f6cdd0fd2498db35e1c51ecdac/TLS-1.3.010.png" />
            
            </figure><p>How this worked in TLS 1.2 is that servers would send the client either a <a href="/tls-session-resumption-full-speed-and-secure/">Session ID or a Session Ticket</a>. The former is just a reference number that the server can trace back to a session, while the latter is an encrypted serialized session which allows the server not to keep state.</p><p>The next time the client would connect, it would send the Session ID or Ticket in the <code>ClientHello</code>, and the server would go like "hey, I know you, we have agreed on a key already", skip the whole key shares dance, and jump straight to <code>Finished</code>, saving a round-trip.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3T989Gfy9u4y0b5WlC2YyJ/926f71e94f528ae25d275bad8399e0fd/TLS-1.3.011.png" />
            
            </figure><p>So, we have a way to do 1-RTT connections in 1.2 if the client has connected before, which is very common. Then what does 1.3 gain us? When resumption is available, <b>1.3 allows us to do 0-RTT connections</b>, again saving one round trip and ending up with no round trip at all.</p><p>If you have connected to a 1.3 server before you can immediately start sending encrypted data, like an HTTP request, without any round-trip at all, making TLS essentially <b>zero overhead</b>.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4x3MsWrJJB60co3tWBTcfD/cc3976fa1688d6bcb39f4fdae04c06ae/TLS-1.3.012.png" />
            
            </figure><p>When a 1.3 client connects to a 1.3 server they agree on a resumption key (or PSK, pre-shared key), and the server gives the client a Session Ticket that will help it remember it. The Ticket can be an encrypted copy of the PSK—to avoid state—or a reference number.</p><p>The next time the client connects, it sends the Session Ticket in the <code>ClientHello</code> and then immediately, without waiting for any round trip, sends the HTTP request encrypted with the PSK. The server figures out the PSK from the Session Ticket and uses that to decrypt the 0-RTT data.</p><p>The client also sends a key share, so that client and server can switch to a new fresh key for the actual HTTP response and the rest of the connection.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3bxXx5kuJvmtOYmxa1JNcy/1cdbfb3ee46d508d6eabcafd44e043b7/TLS-1.3.013.png" />
            
            </figure><p>0-RTT comes with a couple of caveats.</p><p>Since the PSK is not agreed upon with a fresh round of Diffie Hellman, it does not provide Forward Secrecy against a compromise of the Session Ticket key. That is, if in a year an attacker somehow obtains the Session Ticket key, it can decrypt the Session Ticket, obtain the PSK and decrypt the 0-RTT data the client sent (but not the rest of the connection).</p><p>This is why it's important to rotate often and not persist Session Ticket keys (CloudFlare rotates these keys hourly).</p><p>TLS 1.2 has never provided any Forward Secrecy against a compromise of the Session Ticket key at all, so even with 0-RTT 1.3 is an improvement upon 1.2.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/79qHz08hcnfJAk8ogogTfH/1822898b0fc6f0262a2b9c7ae9df0eba/TLS-1.3.014.png" />
            
            </figure><p>More problematic are replay attacks.</p><p>Since with Session Tickets servers are stateless, they have no way to know if a packet of 0-RTT data was already sent before.</p><p>Imagine that the 0-RTT data a client sent is not an HTTP GET ("hey, send me this page") but instead an HTTP POST executing a transaction like "hey, send Filippo 50$". If I'm in the middle I can intercept that <code>ClientHello</code>+0-RTT packet, and then re-send it to the server 100 times. No need to know any key. I now have 5000$.</p><p>Every time the server will see a Session Ticket, unwrap it to find the PSK, use the PSK to decrypt the 0-RTT data and find the HTTP POST inside, with no way to know something is fishy.</p><p>The solution is that servers must not execute operations that are not <i>idempotent</i> received in 0-RTT data. Instead in those cases they should force the client to perform a full 1-RTT handshake. That protects from replay since each <code>ClientHello</code> and <code>ServerHello</code> come with a Random value and connections have sequence numbers, so there's no way to replay recorded traffic verbatim.</p><p>Thankfully, most times the first request a client sends is not a state-changing transaction, but something idempotent like a GET.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2oeeDc9HROItV41GXxqggs/12f35c007af3f031ab681968ff807dad/TLS-1.3.016.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4u8bypRISwPgvFeXhAZfuU/bbcdaf44a4b89aa05390b5e85a4a1a5a/TLS-1.3.017.png" />
            
            </figure>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/45Ejce1oMIH6MiWRDtalo3/537497589b0bf99b28ec8b8488c8279e/TLS-1.3.018.png" />
            
            </figure><p>TLS 1.3 is not only good for cutting a round-trip. It's also better, more robust crypto all around.</p><p>Most importantly, many things were removed. 1.3 marked a shift in the design approach: it used to be the case that the TLS committee would accept any proposal that made sense, and implementations like OpenSSL would add support for it. Think for example Heartbeats, the rarely used feature that cause <a href="/the-results-of-the-cloudflare-challenge/">Heartbleed</a>.</p><p>In 1.3, everything was scrutinized for being really necessary and secure, and scrapped otherwise. A lot of things are gone:</p><ul><li><p>the old <a href="/keyless-ssl-the-nitty-gritty-technical-details/">static RSA handshake without Diffie Hellman</a>, which doesn't offer Forward Secrecy</p></li><li><p>the <a href="/padding-oracles-and-the-decline-of-cbc-mode-ciphersuites/">CBC MAC-then-Encrypt modes</a>, which were responsible for Vaudenay, Lucky13, POODLE, <a href="/yet-another-padding-oracle-in-openssl-cbc-ciphersuites/">LuckyMinus20</a>... replaced by <a href="/go-crypto-bridging-the-performance-gap/">AEADs</a></p></li><li><p>weak primitives like <a href="/killing-rc4-the-long-goodbye/">RC4</a>, SHA1, MD5</p></li><li><p>compression</p></li><li><p>renegotiation</p></li><li><p>custom FFDHE groups</p></li><li><p>RSA PKCS#1v1.5</p></li><li><p>explicit nonces</p></li></ul><p>We'll go over these in more detail in future blog posts.</p><p>Some of these were not necessarily broken by design, but they were dangerous, hard to implement correctly and easy to get wrong. The new excellent trend of TLS 1.3 and cryptography in general is to make mistakes less likely at the design stage, since humans are not perfect.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/56pPwUsCFfumH2fYMeL8FN/35aa6d17ae21394702ce0122e6bdfb5c/TLS-1.3.019.png" />
            
            </figure><p>A new version of a protocol obviously can't dictate how older implementations behave and 1.3 can't improve the security of 1.2 systems. So how do you make sure that if tomorrow TLS 1.2 is completely broken, a client and server that both support 1.2 and 1.3 can't be tricked into using 1.2 by a proxy?</p><p>A MitM could change the <code>ClientHello</code> to say "I want to talk at most TLS 1.2", and then use whichever attack it discovered to make the 1.2 connection succeed even if it tampered with a piece of the handshake.</p><p>1.3 has a clever solution to this: if a 1.3 server has to use 1.2 because it looks like the client doesn't support 1.3, it will "hide a message" in the Server Random value. A real 1.2 will completely ignore it, but a client that supports 1.3 would know to look for it, and would discover that it's being tricked into downgrading to 1.2.</p><p>The Server Random is signed with the certificate in 1.2, so it's impossible to fake even if pieces of 1.2 are broken. This is very important because it will allow us to keep supporting 1.2 in the future even if it's found to be weaker, unlike we had to do with <a href="/sslv3-support-disabled-by-default-due-to-vulnerability/">SSLv3 and POODLE</a>. With 1.3 we will know for sure that clients that can do any better are not being put at risk, allowing us to make sure <a href="/ensuring-that-the-web-is-for-everyone/">the Internet is for Everyone</a>.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2Q7LYbjQBvS2WP0PUvkmdk/6fb34646631b248c2d3750b4b2c198c5/TLS-1.3.020.png" />
            
            </figure><p>So this is TLS 1.3. Meant to be a solid, safe, robust, simple, essential foundation for Internet encryption for the years to come. And it's faster, so that no one will have performance reasons not to implement it.</p><p>TLS 1.3 is still a draft and it might change before being finalized, but at CloudFlare we are actively developing a 1.3 stack compatible with current experimental browsers, so <a href="/introducing-tls-1-3/">everyone can get it today</a>.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3xQPlSJ4uBQaS60OpFogxD/7967fae7cd0f8766658a02c0534b95c5/TLS-1.3.023.png" />
            
            </figure><p>The TLS 1.3 spec is <a href="https://github.com/tlswg/tls13-spec">on GitHub</a>, so anyone can contribute. Just while making the slides for this presentation I noticed I was having a hard time understanding a system because a diagram was missing some details, so I submitted a PR to fix it. How easy is that!?</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/3Ecgvi8YtzFKhcoQjoPLzU/cb6a281fbd8246c453bc145ca36a7b56/TLS-1.3.026.png" />
            
            </figure><p>Like any talk, at the end there's the Q&amp;A. Send your questions to <a>tls13@cloudflare.com</a> or leave them in the Disqus comments below and I'll answer them in an upcoming blog post!</p> ]]></content:encoded>
            <category><![CDATA[TLS]]></category>
            <category><![CDATA[TLS 1.3]]></category>
            <category><![CDATA[Events]]></category>
            <category><![CDATA[United Kingdom]]></category>
            <category><![CDATA[Security]]></category>
            <category><![CDATA[Cryptography]]></category>
            <guid isPermaLink="false">6CKPHn0MEFqMdmC3vvaDvo</guid>
            <dc:creator>Filippo Valsorda</dc:creator>
        </item>
        <item>
            <title><![CDATA[The complete guide to Go net/http timeouts]]></title>
            <link>https://blog.cloudflare.com/the-complete-guide-to-golang-net-http-timeouts/</link>
            <pubDate>Wed, 29 Jun 2016 13:09:27 GMT</pubDate>
            <description><![CDATA[ When writing an HTTP server or client in Go, timeouts are amongst the easiest and most subtle things to get wrong: there’s many to choose from, and a mistake can have no consequences for a long time, until the network glitches and the process hangs. ]]></description>
            <content:encoded><![CDATA[ <p>When writing an HTTP server or client in Go, timeouts are amongst the easiest and most subtle things to get wrong: there’s many to choose from, and a mistake can have no consequences for a long time, until the network glitches and the process hangs.</p><p><a href="https://www.cloudflare.com/learning/ddos/glossary/hypertext-transfer-protocol-http/">HTTP</a> is a complex multi-stage protocol, so there's no one-size fits all solution to timeouts. Think about a <a href="https://www.cloudflare.com/learning/video/what-is-streaming/">streaming</a> endpoint versus a JSON API versus a <a href="https://en.wikipedia.org/wiki/Comet_%28programming%29">Comet</a> endpoint. Indeed, the defaults are often not what you want.</p><p>In this post I’ll take apart the various stages you might need to apply a timeout to, and look at the different ways to do it, on both the Server and the Client side.</p>
    <div>
      <h3>SetDeadline</h3>
      <a href="#setdeadline">
        
      </a>
    </div>
    <p>First, you need to know about the network primitive that Go exposes to implement timeouts: Deadlines.</p><p>Exposed by <a href="https://golang.org/pkg/net/#Conn"><code>net.Conn</code></a> with the <code>Set[Read|Write]Deadline(time.Time)</code> methods, Deadlines are an absolute time which when reached makes all I/O operations fail with a timeout error.</p><p><b>Deadlines are not timeouts.</b> Once set they stay in force forever (or until the next call to <code>SetDeadline</code>), no matter if and how the connection is used in the meantime. So to build a timeout with <code>SetDeadline</code> you'll have to call it before <i>every</i> <code>Read</code>/<code>Write</code> operation.</p><p>You probably don't want to call <code>SetDeadline</code> yourself, and let <code>net/http</code> call it for you instead, using its higher level timeouts. However, keep in mind that all timeouts are implemented in terms of Deadlines, so they <b>do NOT reset every time data is sent or received</b>.</p>
    <div>
      <h3>Server Timeouts</h3>
      <a href="#server-timeouts">
        
      </a>
    </div>
    <p><i>The </i><a href="/exposing-go-on-the-internet/"><i>"So you want to expose Go on the Internet" post</i></a><i> has more information on server timeouts, in particular about HTTP/2 and Go 1.7 bugs.</i></p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/sOlH0CkRueHYFHY9JhYag/3688eee879f10624eb42b75302000929/Timeouts-001.png" />
            
            </figure><p>It's critical for an HTTP server exposed to the Internet to enforce timeouts on client connections. Otherwise very slow or disappearing clients might leak file descriptors and eventually result in something along the lines of:</p>
            <pre><code>http: Accept error: accept tcp [::]:80: accept4: too many open files; retrying in 5ms</code></pre>
            <p>There are two timeouts exposed in <code>http.Server</code>: <code>ReadTimeout</code> and <code>WriteTimeout</code>. You set them by explicitly using a Server:</p>
            <pre><code>srv := &amp;http.Server{
    ReadTimeout: 5 * time.Second,
    WriteTimeout: 10 * time.Second,
}
log.Println(srv.ListenAndServe())</code></pre>
            <p><code>ReadTimeout</code> covers the time from when the connection is accepted to when the request body is fully read (if you do read the body, otherwise to the end of the headers). It's implemented in <code>net/http</code> by calling <code>SetReadDeadline</code> <a href="https://github.com/golang/go/blob/3ba31558d1bca8ae6d2f03209b4cae55381175b3/src/net/http/server.go#L750">immediately after Accept</a>.</p><p><code>WriteTimeout</code> normally covers the time from the end of the request header read to the end of the response write (a.k.a. the lifetime of the ServeHTTP), by calling <code>SetWriteDeadline</code> <a href="https://github.com/golang/go/blob/3ba31558d1bca8ae6d2f03209b4cae55381175b3/src/net/http/server.go#L753-L755">at the end of readRequest</a>.</p><p>However, when the connection is <a href="https://www.cloudflare.com/learning/ssl/what-is-https/">HTTPS</a>, <code>SetWriteDeadline</code> is called <a href="https://github.com/golang/go/blob/3ba31558d1bca8ae6d2f03209b4cae55381175b3/src/net/http/server.go#L1477-L1483">immediately after Accept</a> so that it also covers the packets written as part of the TLS handshake. Annoyingly, this means that (in that case only) <code>WriteTimeout</code> ends up including the header read and the first byte wait.</p><p>You should set both timeouts when you deal with untrusted clients and/or networks, so that a client can't hold up a connection by being slow to write or read.</p><p>Finally, there's <a href="https://golang.org/pkg/net/http/#TimeoutHandler"><code>http.TimeoutHandler</code></a>. It’s not a Server parameter, but a Handler wrapper that limits the maximum duration of <code>ServeHTTP</code> calls. It works by buffering the response, and sending a <i>504 Gateway Timeout</i> instead if the deadline is exceeded. Note that it is <a href="https://github.com/golang/go/issues/15327">broken in 1.6 and fixed in 1.6.2</a>.</p>
    <div>
      <h4>http.ListenAndServe is doing it wrong</h4>
      <a href="#http-listenandserve-is-doing-it-wrong">
        
      </a>
    </div>
    <p>Incidentally, this means that the package-level convenience functions that bypass <code>http.Server</code> like <code>http.ListenAndServe</code>, <code>http.ListenAndServeTLS</code> and <code>http.Serve</code> are unfit for public Internet servers.</p><p>Those functions leave the Timeouts to their default off value, with no way of enabling them, so if you use them you'll soon be leaking connections and run out of file descriptors. I've made this mistake at least half a dozen times.</p><p>Instead, create a <code>http.Server</code> instance with ReadTimeout and WriteTimeout and use its corresponding methods, like in the example a few paragraphs above.</p>
    <div>
      <h4>About streaming</h4>
      <a href="#about-streaming">
        
      </a>
    </div>
    <p>Very annoyingly, there is no way of accessing the underlying <code>net.Conn</code> from <code>ServeHTTP</code> so a server that intends to stream a response is forced to unset the <code>WriteTimeout</code> (which is also possibly why they are 0 by default). This is because without <code>net.Conn</code> access, there is no way of calling <code>SetWriteDeadline</code> before each <code>Write</code> to implement a proper idle (not absolute) timeout.</p><p>Also, there's no way to cancel a blocked <code>ResponseWriter.Write</code> since <code>ResponseWriter.Close</code> (which you can access via an interface upgrade) is not documented to unblock a concurrent Write. So there's no way to build a timeout manually with a Timer, either.</p><p>Sadly, this means that streaming servers can't really defend themselves from a slow-reading client.</p><p>I submitted <a href="https://github.com/golang/go/issues/16100">an issue with some proposals</a>, and I welcome feedback there.</p>
    <div>
      <h3>Client Timeouts</h3>
      <a href="#client-timeouts">
        
      </a>
    </div>
    
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/255U6cjjLCH1vUEJoA9QWo/b73eae7e11ea4e42c847a5ee9c9c8f96/Timeouts-002.png" />
            
            </figure><p>Client-side timeouts can be simpler or much more complex, depending which ones you use, but are just as important to prevent leaking resources or getting stuck.</p><p>The easiest to use is the <code>Timeout</code> field of <a href="https://golang.org/pkg/net/http/#Client"><code>http.Client</code></a>. It covers the entire exchange, from Dial (if a connection is not reused) to reading the body.</p>
            <pre><code>c := &amp;http.Client{
    Timeout: 15 * time.Second,
}
resp, err := c.Get("https://blog.filippo.io/")</code></pre>
            <p>Like the server-side case above, the package level functions such as <code>http.Get</code> use <a href="https://golang.org/pkg/net/http/#DefaultClient">a Client without timeouts</a>, so are dangerous to use on the open Internet.</p><p>For more granular control, there are a number of other more specific timeouts you can set:</p><ul><li><p><code>net.Dialer.Timeout</code> limits the time spent establishing a TCP connection (if a new one is needed).</p></li><li><p><code>http.Transport.TLSHandshakeTimeout</code> limits the time spent performing the TLS handshake.</p></li><li><p><code>http.Transport.ResponseHeaderTimeout</code> limits the time spent reading the headers of the response.</p></li><li><p><code>http.Transport.ExpectContinueTimeout</code> limits the time the client will wait between sending the request headers <i>when including an </i><code><i>Expect: 100-continue</i></code> and receiving the go-ahead to send the body. Note that setting this in 1.6 <a href="https://github.com/golang/go/issues/14391">will disable HTTP/2</a> (<code>DefaultTransport</code> <a href="https://github.com/golang/go/commit/406752b640fcc56a9287b8454564cffe2f0021c1#diff-6951e7593bfb1e773c9121df44df1c36R179">is special-cased from 1.6.2</a>).</p></li></ul>
            <pre><code>c := &amp;http.Client{
    Transport: &amp;http.Transport{
        Dial: (&amp;net.Dialer{
                Timeout:   30 * time.Second,
                KeepAlive: 30 * time.Second,
        }).Dial,
        TLSHandshakeTimeout:   10 * time.Second,
        ResponseHeaderTimeout: 10 * time.Second,
        ExpectContinueTimeout: 1 * time.Second,
    }
}</code></pre>
            <p>As far as I can tell, there's no way to limit the time spent sending the request specifically. The time spent reading the request body can be controlled manually with a <code>time.Timer</code> since it happens after the Client method returns (see below for how to cancel a request).</p><p>Finally, new in 1.7, there's <code>http.Transport.IdleConnTimeout</code>. It does not control a blocking phase of a client request, but how long an idle connection is kept in the connection pool.</p><p>Note that a Client will follow redirects by default. <code>http.Client.Timeout</code> includes all time spent following redirects, while the granular timeouts are specific for each request, since <code>http.Transport</code> is a lower level system that has no concept of redirects.</p>
    <div>
      <h4>Cancel and Context</h4>
      <a href="#cancel-and-context">
        
      </a>
    </div>
    <p><code>net/http</code> offers two ways to cancel a client request: <code>Request.Cancel</code> and, new in 1.7, Context.</p><p><code>Request.Cancel</code> is an optional channel that when set and then closed causes the request to abort as if the <code>Request.Timeout</code> had been hit. (They are actually implemented through the same mechanism, and while writing this post I <a href="https://github.com/golang/go/issues/16094">found a bug</a> in 1.7 where all cancellations would be returned as timeout errors.)</p><p>We can use <code>Request.Cancel</code> and <code>time.Timer</code> to build a more granular timeout that allows streaming, pushing the deadline back every time we successfully read some data from the Body:</p>
            <pre><code>package main

import (
	"io"
	"io/ioutil"
	"log"
	"net/http"
	"time"
)

func main() {
	c := make(chan struct{})
	timer := time.AfterFunc(5*time.Second, func() {
		close(c)
	})

        // Serve 256 bytes every second.
	req, err := http.NewRequest("GET", "http://httpbin.org/range/2048?duration=8&amp;chunk_size=256", nil)
	if err != nil {
		log.Fatal(err)
	}
	req.Cancel = c

	log.Println("Sending request...")
	resp, err := http.DefaultClient.Do(req)
	if err != nil {
		log.Fatal(err)
	}
	defer resp.Body.Close()

	log.Println("Reading body...")
	for {
		timer.Reset(2 * time.Second)
                // Try instead: timer.Reset(50 * time.Millisecond)
		_, err = io.CopyN(ioutil.Discard, resp.Body, 256)
		if err == io.EOF {
			break
		} else if err != nil {
			log.Fatal(err)
		}
	}
}</code></pre>
            <p>In the example above, we put a timeout of 5 seconds on the Do phases of the request, but then we spend at least 8 seconds reading the body in 8 rounds, each time with a timeout of 2 seconds. We could go on streaming like this forever without risk of getting stuck. If we were not to receive body data for more than 2 seconds, then io.CopyN would return <code>net/http: request canceled</code>.</p><p>In 1.7 the <code>context</code> package graduated to the standard library. There's <a href="https://blog.golang.org/context">a lot to learn about Contexts</a>, but for our purposes you should know that they replace and deprecate <code>Request.Cancel</code>.</p><p>To use Contexts to cancel a request we just obtain a new Context and its <code>cancel()</code> function with <code>context.WithCancel</code> and create a Request bound to it with <code>Request.WithContext</code>. When we want to cancel the request, we cancel the Context by calling <code>cancel()</code> (instead of closing the Cancel channel):</p>
            <pre><code>ctx, cancel := context.WithCancel(context.TODO())
timer := time.AfterFunc(5*time.Second, func() {
	cancel()
})

req, err := http.NewRequest("GET", "http://httpbin.org/range/2048?duration=8&amp;chunk_size=256", nil)
if err != nil {
	log.Fatal(err)
}
req = req.WithContext(ctx)</code></pre>
            <p>Contexts have the advantage that if the parent context (the one we passed to <code>context.WithCancel</code>) is canceled, ours will be, too, propagating the command down the entire pipeline.</p><p>This is all. I hope I didn't exceed your <code>ReadDeadline</code>!</p><p>If this kind of deep dive into the Go standard libraries sound entertaining to you, know that <a href="https://www.cloudflare.com/join-our-team/">we are hiring in London, Austin (TX), Champaign (IL), San Francisco and Singapore.</a></p> ]]></content:encoded>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[API]]></category>
            <category><![CDATA[Programming]]></category>
            <category><![CDATA[Go]]></category>
            <guid isPermaLink="false">5IUTFQotCX6VHDqnrENjnx</guid>
            <dc:creator>Filippo Valsorda</dc:creator>
        </item>
        <item>
            <title><![CDATA[Yet Another Padding Oracle in OpenSSL CBC Ciphersuites]]></title>
            <link>https://blog.cloudflare.com/yet-another-padding-oracle-in-openssl-cbc-ciphersuites/</link>
            <pubDate>Wed, 04 May 2016 12:20:42 GMT</pubDate>
            <description><![CDATA[ Yesterday a new vulnerability has been announced in OpenSSL/LibreSSL. A padding oracle in CBC mode decryption, to be precise. Just like Lucky13. Actually, it’s in the code that fixes Lucky13. ]]></description>
            <content:encoded><![CDATA[ <p>Yesterday a new vulnerability <a href="https://www.openssl.org/news/secadv/20160503.txt">has been announced</a> in OpenSSL/LibreSSL. A <i>padding oracle in CBC mode decryption</i>, to be precise. Just like <a href="https://en.wikipedia.org/wiki/Lucky_Thirteen_attack">Lucky13</a>. Actually, it’s in the code that fixes Lucky13.</p><p>It was found by <a href="http://web-in-security.blogspot.co.uk/2016/05/curious-padding-oracle-in-openssl-cve.html">Juraj Somorovsky</a> using a tool he developed called <a href="https://github.com/RUB-NDS/TLS-Attacker">TLS-Attacker</a>. Like in the “old days”, it has no name except CVE-2016-2107. (I call it LuckyNegative20[^1])</p><p>It’s a wonderful example of a padding oracle in constant time code, so we’ll dive deep into it. But first, two quick background paragraphs. If you already know all about Lucky13 and how it's mitigated in OpenSSL <a href="#offby20">jump to "Off by 20"</a> for the hot and new.</p><p>If, before reading, you want to check that your server is safe, you can do it <a href="https://filippo.io/CVE-2016-2107/">with this one-click online test</a>.</p>
    <div>
      <h3>TLS, CBC, and Mac-then-Encrypt</h3>
      <a href="#tls-cbc-and-mac-then-encrypt">
        
      </a>
    </div>
    <p>Very long story short, the CBC cipher suites in TLS have a design flaw: they first compute the HMAC of the plaintext, then encrypt <code>plaintext || HMAC || padding || padding length</code> using CBC mode. The receiving end is then left with the uncomfortable task of decrypting the message and checking HMAC and padding <i>without revealing the padding length in any way</i>. If they do, we call that a <b>padding oracle</b>, and a MitM can use it to learn the value of the last byte of any block, and by iteration often the entire message.</p>
            <figure>
            <a href="https://moxie.org/2011/12/13/the-cryptographic-doom-principle.html">
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/44IkC1Y8GbkE4EY00AfCBO/2bf4bf76ebbc1f91d4e060b66d7e4c7d/S22C-6e16050319300-1.jpg" />
            </a>
            </figure><p>In other words, the CBC mode cipher suites are doomed by <a href="https://moxie.org/2011/12/13/the-cryptographic-doom-principle.html">The Cryptographic Doom Principle</a>. Sadly though, they are the only “safe” cipher suites to use with TLS prior to version 1.2, and they account for <b>26% of connections to the CloudFlare edge</b>, so attacks against them like this one are still very relevant.</p><p>My colleague Nick Sullivan explained CBC padding oracles and their history in much more detail <a href="https://blog.cloudflare.com/padding-oracles-and-the-decline-of-cbc-mode-ciphersuites/">on this very blog</a>. If you didn’t understand my coarse summary, please take the time to read his post as it’s needed to understand what goes wrong next.</p>
    <div>
      <h3>Constant time programming crash course</h3>
      <a href="#constant-time-programming-crash-course">
        
      </a>
    </div>
    <p>The “solution” for the problem of leaking information about the <i>padding length</i> value is to write the entire HMAC and padding check code to run in perfectly constant time. What I once heard Daniel J. Bernstein call “binding your hands behind your back and seeing what you can do with just your nose”. As you might imagine it’s not easy.</p><p>The idea is that you can’t use <code>if</code>s. So instead you store the results of your checks by doing bitwise <code>AND</code>s with a result variable which you set to 1 before you begin, and at which you only look at the end of the entire operation. If any of the checks returned 0, then the result will be 0, otherwise it will be 1. This way the attacker only learns “everything including padding and HMAC is good” or “nope”.</p><p>Not only that, but loop iterations can only depend on public data: if for example the message is 32 bytes long then the padding can be at most 32-1-20=11 bytes, so you have to check 20 (HMAC-SHA1) + 11 bytes. To ignore the bytes you don’t actually have to check, you use a <b>mask</b>: a value generated with constant time operations that will be <code>0xff</code> when the bytes are supposed to match and <code>0x00</code> when they are not. By <code>AND</code>ing the mask to the bitwise difference you get a value which is 0 when the values either match, or are not supposed to match.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1nbnTmVholEmauPKBYPl35/c36b1fa43df1459a31146c16ad3be0cc/S22C-6e16050319300.jpg" />
            
            </figure><p>This technique is crucial to this vulnerability, as it’s exactly its constant time, no-complaints nature that made the mistake possible.</p><p>The <a href="https://github.com/openssl/openssl/blob/cba792a1e941788cba7dc700a2ef59420e7f2522/crypto/evp/e_aes_cbc_hmac_sha1.c#L739-L763">OpenSSL checking code</a> was written after the discovery of <a href="http://www.isg.rhul.ac.uk/tls/TLStiming.pdf">Lucky13</a>, and Adam Langley wrote <a href="https://www.imperialviolet.org/2013/02/04/luckythirteen.html">a long blog post</a> explaining how the generic version works. I glossed over many details here, in particular the incredible pain that is generating the HMAC of a variable-length message in constant time (which is where Lucky13 and <a href="https://eprint.iacr.org/2015/1129.pdf">Lucky Microseconds</a> happened), so go read his incredible write-up to learn more.</p>
    <div>
      <h3>Off by 20</h3>
      <a href="#off-by-20">
        
      </a>
    </div>
    <p>Like any respectable vulnerability analysis, let’s start with <a href="https://github.com/openssl/openssl/commit/70428eada9bc4cf31424d723d1f992baffeb0dfb">the patch</a>. We’ll focus on the SHA1 part for simplicity, but the SHA256 one is exactly the same.</p><p>The patch for <i>LuckyMinus20</i> is one line in the OpenSSL function that performs AES-CBC decryption and checks HMAC and padding. All it is doing is flagging the check as failed if the padding length value is higher than the maximum it could possibly be.</p>
            <pre><code>              pad = plaintext[len - 1];
              maxpad = len - (SHA_DIGEST_LENGTH + 1);
              maxpad |= (255 - maxpad) &gt;&gt; (sizeof(maxpad) * 8 - 8);
              maxpad &amp;= 255;
  
 +            ret &amp;= constant_time_ge(maxpad, pad);
 +</code></pre>
            <p>This patch points straight at how to trigger the vulnerability, since it can’t have any side effect: it’s constant time code after all! It means that if we do what the patch is there to detect—send a padding length byte higher than <code>maxpad</code>—we can sometimes pass the HMAC/padding check (!!), and use that fact as an oracle.</p><p><code>maxpad</code> is easily calculated: length of the plaintext, minus one byte of payload length, minus 20 bytes of HMAC-SHA1 (<code>SHA_DIGEST_LENGTH</code>), capped at 255. It’s a public value since it only depends on the <i>encrypted</i> data length. It’s also involved in deciding how many times to loop the HMAC/padding check, as we don’t want to go out of the bounds of the message buffer or waste time checking message bytes that couldn’t possibly be padding/HMAC.</p><p>So, how could <code>pad &gt; maxpad</code> allow us pass the HMAC check fraudulently?</p><p>Let’s go back to our masks, this time thinking what happens if we send:</p>
            <pre><code>pad = maxpad + 1 = (len - 20 - 1) + 1 = (32 - 20 - 1) + 1 = 12</code></pre>
            
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4W0sqfy51XUKtMKyALPyKz/17d959f7ca4aaf9be1a460f3edbbd910/S22C-6e16050319300-2.jpg" />
            
            </figure><p>Uh oh. What happened there!? One byte of the HMAC checking mask fell out of the range we are actually checking! Nothing crashes because this is constant time code and it must tolerate these cases without any external hint. So, what if we push a little further…</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6Dnjxv3GwsnYb0Mlmd90mi/7e4acf897a76a897eef7fe38e33a662c/S22C-6e16050319300-3.jpg" />
            
            </figure><p>There. The HMAC mask is now 0 for all checking loop iterations, so we don’t have to worry about having a valid signature at all anymore, and we’ll pass the MAC/padding check unconditionally if all bytes in the message are valid padding bytes—that is, equal to the padding length byte.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4BpDOnIA9ejgpYAYSnK56r/79b181746f24dce28a72d156fc520780/S22C-6e16050319300-4.jpg" />
            
            </figure><p>I like to box and label these kinds of “cryptanalytic functions” and call them capabilities. Here’s our capability:</p><p><b>We can discover if a message is made entirely of bytes with value n, where n &gt;= maxpad + 20</b> by sending it to the server and observing an error different from BAD_MAC. This can only work with messages shorter than 256-20=236 bytes, as pad can be at most 255, being a 8-bit value.</p><p>NOTE: As <a href="http://web-in-security.blogspot.co.uk/2016/05/curious-padding-oracle-in-openssl-cve.html">Juraj clarifies</a>, there is no need to do delicate timing statistics like in Lucky 13, because if the tampered message is sent at the time the Finished message should be sent, the server response (BAD_MAC or not) will be unencrypted.</p><p>We can use this capability as an oracle.</p><p>In other words, the entire trick is making the computed payload length "negative" by at least <code>DIGEST_LENGTH</code> so that the computed position of the HMAC mask is fully "outside" the message and the MAC ends up not being checked at all.</p><p>By the way, the fact that the affected function is called <code>aesni_cbc_hmac_sha1_cipher</code> explains why only servers with <a href="https://en.wikipedia.org/wiki/AES_instruction_set">AES-NI instructions</a> are vulnerable.</p><p><i>You can find a standalone, simplified version of the vulnerable function </i><a href="https://gist.github.com/FiloSottile/064cae24c11e792a46af881dfd826f76"><i>here</i></a><i>, if you want to play with the debugger yourself.</i></p>
    <div>
      <h3>From there to decryption</h3>
      <a href="#from-there-to-decryption">
        
      </a>
    </div>
    <p>Ok, we have a “capability”. But how do we go from there to decryption? The “classic” way involves focusing on one byte that we don’t know, preceded by bytes we control (in this case, a string of <code>31</code>). We then guess the unknown byte by XOR’ing the corresponding ciphertext byte in the previous block with a value <code>n</code>. We do this for many <code>n</code>. When the server replies “check passed”, we know that the mystery byte is <code>31 XOR n</code>.</p><p>Remember that this is CBC mode, so we can cut out any consecutive sequence of blocks and present it as its own ciphertext, and values XOR’d to a given ciphertext byte end up XOR’d to the corresponding plaintext byte in the next block. The diagram below should make this clearer.</p><p>In this scenario, we inject JavaScript in a plain-HTTP page the user has loaded in a different tab and send as many AJAX requests as we want to our target domain. Usually we use the <code>GET</code> path as the known controlled bytes and the headers as the decryption target.</p><p>However, this approach doesn’t work in this case because we need <i>two</i> blocks of consecutive equal plaintext bytes, and if we touch the ciphertext in block <code>x</code> to help decrypt block <code>x + 1</code>, we mangle the plaintext of block <code>x</code> irreparably.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1oMUox1ZJxyFNdezDD7N4v/124e57480a63374e9bd064608a0fd298/S22C-6e16050401360-1-1.jpg" />
            
            </figure><p>I’d like you to pause for a minute to realize that this is at all possible only because we:</p><ol><li><p>Allow unauthenticated (plain-HTTP) websites—and so any MitM—to run code on our machines.</p></li><li><p>Allow JavaScript to send requests to third-party origins (i.e. other websites) which carry our private cookies. Incidentally, this is also what enables CSRF.</p></li></ol><p>Back to our oracle, we can make it work the other way around: we align an unknown byte at the beginning of two blocks made of <code>31</code> (which we can send as POST body) and we XOR our <code>n</code> values with the first byte in the IV. Again, when the MAC check passes, we know that the target byte is <code>31 XOR n</code>.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2tda3NbvCPrDOYBmz6fWbm/8b5091edfc56ab0cf9bc23495b6eba3e/S22C-6e16050401360-2.jpg" />
            
            </figure><p>We iterate this process, moving all bytes one position to the right (for example by making the POST path one character longer), and XOR’ing the first byte of the IV with our <code>n</code> guesses, and the second byte with the number we now know will turn it into a <code>31</code>.</p><p>And so on and so forth. This gets us to decrypt any 16 bytes that consistently appear just before a sequence of at least 31 attacker-controlled bytes and after a string of attacker-controlled length (for alignment).</p><p>It might be possible to extend the decryption to all blocks preceding the two controlled ones by carefully pausing the client request and using the early feedback, but I'm not sure it works. (Chances of me being wrong on full decryption capabilities are high, considering that an <a href="https://twitter.com/RichSalz/status/727527231686344705">OpenSSL team member said they are not aware of such a technique</a>.) UPDATE: we chatted a bit with Juraj, and no, it doesn't work.</p>
    <div>
      <h3>But no further</h3>
      <a href="#but-no-further">
        
      </a>
    </div>
    <p>One might think that since this boils down to a MAC check bypass, we might use it to inject unauthenticated messages into the connection. However, since we rely on making <code>pad</code> higher than <code>maxpad + DIGEST_LENGTH</code>, we can't avoid making the payload length (<code>totlen - 1 - DIGEST_LENGTH - pad</code>) negative.</p>
            <pre><code>pad &gt; maxpad + DIGEST_LENGTH = (totlen - DIGEST_LENGTH - 1) + DIGEST_LENGTH = totlen - 1 &gt; totlen - 1 - DIGEST_LENGTH
0 &gt; totlen - 1 - DIGEST_LENGTH - pad = payload_len</code></pre>
            <p>Since <code>length</code> is stored as an unsigned integer, when it goes negative it wraps around to a very high number <a href="https://github.com/openssl/openssl/blob/9f2ccf1d718ab66c778a623f9aed3cddf17503a2/ssl/s3_pkt.c#L550-L554">and triggers the "payload too long" error</a>, which terminates the connection. Too bad... Good. I meant good :-)</p><p>This means, by the way, that successful attempts to exploit or test this vulnerability show up in OpenSSL logs as</p>
            <pre><code>SSL routines:SSL3_GET_RECORD:data length too long:s3_pkt.c:</code></pre>
            
    <div>
      <h3>Testing for the vulnerability</h3>
      <a href="#testing-for-the-vulnerability">
        
      </a>
    </div>
    <p>Detecting a vulnerable server is as easy as sending an encrypted message which decrypts to <code>AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA</code>, and checking if the TLS alert is <code>DATA_LENGTH_TOO_LONG</code> (vulnerable) or <code>BAD_RECORD_MAC</code> (not vulnerable).</p><p>It works because <code>A</code> is 65, which is more than 32 - 1.</p><p>There’s a simple implementation at <a href="https://github.com/FiloSottile/CVE-2016-2107">https://github.com/FiloSottile/CVE-2016-2107</a> and an online version at <a href="https://filippo.io/CVE-2016-2107/">https://filippo.io/CVE-2016-2107/</a></p>
    <div>
      <h3>Conclusion</h3>
      <a href="#conclusion">
        
      </a>
    </div>
    <p>To sum up the impact: when the connection uses AES-CBC (for example because the server or the client don’t support TLS 1.2 yet) and the server’s processor supports AES-NI, a skilled MitM attacker can recover at least 16 bytes of anything it can get the client to send repeatedly just before attacker-controlled data (like HTTP Cookies, using JavaScript cross-origin requests).</p><p>A more skilled attacker than me might also be able to decrypt more than 16 bytes, but no one has shown that it’s possible yet.</p><p>All CloudFlare websites are protected from this vulnerability. Connections from the CloudFlare edge to origins use TLS 1.2 and AES-GCM if the server supports it, so are safe in that case. Customers supporting only AES-CBC should upgrade as soon as possible.</p><p>In closing, the cryptographic development community has shown that writing secure constant-time MtE decryption code is extremely difficult to do. We can keep patching this code, but the safer thing to do is to move to cryptographic primitives like <a href="/go-crypto-bridging-the-performance-gap/">AEAD</a> that are designed not to require constant time acrobatics. Eventually, TLS 1.2 adoption will be high enough to give CBC the <a href="/killing-rc4-the-long-goodbye/">RC4</a> <a href="/end-of-the-road-for-rc4/">treatment</a>. But we are not there yet.</p><p><i>Are you crazy enough that analyzing a padding oracle in a complex crypto protocol sounds </i><b><i>exciting</i></b><i> to you? </i><a href="https://www.cloudflare.com/join-our-team/"><i>We are hiring in London, San Francisco, Austin and Singapore</i></a><i>, including cryptography roles.</i></p><p>Thanks to Anna Bernardi who helped with the analysis and the painful process of reading OpenSSL code. Thanks (in no particular order) to Nick Sullivan, Ryan Hodson, Evan Johnson, Joshua A. Kroll, John Graham-Cumming and Alessandro Ghedini for their help with this write-up.</p><hr />
            <pre><code>[^1]: Although JGC insists it should be called FreezerBurn because -20, geddit?.</code></pre>
             ]]></content:encoded>
            <category><![CDATA[TLS]]></category>
            <category><![CDATA[Vulnerabilities]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[SSL]]></category>
            <category><![CDATA[Security]]></category>
            <guid isPermaLink="false">1TPtOeFIyWKh61mmkpakWc</guid>
            <dc:creator>Filippo Valsorda</dc:creator>
        </item>
        <item>
            <title><![CDATA[Building the simplest Go static analysis tool]]></title>
            <link>https://blog.cloudflare.com/building-the-simplest-go-static-analysis-tool/</link>
            <pubDate>Wed, 27 Apr 2016 15:01:15 GMT</pubDate>
            <description><![CDATA[ Go native vendoring (a.k.a. GO15VENDOREXPERIMENT) allows you to freeze dependencies by putting them in a vendor folder in your project. The compiler will then look there before searching the GOPATH. ]]></description>
            <content:encoded><![CDATA[ <p><a href="https://docs.google.com/document/d/1Bz5-UB7g2uPBdOx-rw5t9MxJwkfpx90cqG9AFL0JAYo/edit">Go native vendoring</a> (a.k.a. GO15VENDOREXPERIMENT) allows you to freeze dependencies by putting them in a <code>vendor</code> folder in your project. The compiler will then look there before searching the GOPATH.</p><p>The only annoyance compared to using a per-project GOPATH, which is what we used to do, is that you might forget to vendor a package that you have in your GOPATH. The program will build for you, but it won't for anyone else. Back to the <a href="https://www.urbandictionary.com/define.php?term=wfm">WFM</a> times!</p><p>I decided I wanted something, a tool, to check that all my (non-stdlib) dependencies were vendored.</p><p>At first I thought of using <a href="https://golang.org/cmd/go/#hdr-List_packages"><code>go list</code></a>, which Dave Cheney appropriately called a <a href="http://dave.cheney.net/2014/09/14/go-list-your-swiss-army-knife">swiss army knife</a>, but while it can show the entire recursive dependency tree (format <code>.Deps</code>), there's no way to know from the templating engine if a dependency is in the standard library.</p><p>We could just pass each output back into <code>go list</code> to check for <code>.Standard</code>, but I thought this would be a good occasion to build a very simple static analysis tool. Go's simplicity and libraries make it a very easy task, as you will see.</p>
    <div>
      <h3>First, loading the program</h3>
      <a href="#first-loading-the-program">
        
      </a>
    </div>
    <p>We use <a href="https://godoc.org/golang.org/x/tools/go/loader"><code>golang.org/x/tools/go/loader</code></a> to load the packages passed as arguments on the command line, including the test files based on a flag.</p>
            <pre><code>var conf loader.Config
for _, p := range flag.Args() {
    if *tests {
        conf.ImportWithTests(p)
    } else {
        conf.Import(p)
    }
}
prog, err := conf.Load()
if err != nil {
    log.Fatal(err)
}
for p := range prog.AllPackages {
    fmt.Println(p.Path())
}</code></pre>
            <p>With these few lines we already replicated <code>go list -f {{ .Deps }}</code>!</p><p>The only missing loading feature here is wildcard (<code>./...</code>) support. That code <a href="https://github.com/golang/go/blob/87bca88c703c1f14fe8473dc2f07dc521cf2b989/src/cmd/go/main.go#L365">is in the go tool source</a> and it's unexported. There's an <a href="https://github.com/golang/go/issues/8768">issue</a> about exposing it, but for now packages <a href="https://github.com/golang/lint/blob/58f662d2fc0598c6c36a92ae29af1caa6ec89d7a/golint/import.go">are just copy-pasting it</a>. We'll use a packaged version of that code, <a href="https://github.com/kisielk/gotool"><code>github.com/kisielk/gotool</code></a>:</p>
            <pre><code>for _, p := range gotool.ImportPaths(flag.Args()) {</code></pre>
            <p>Finally, since we are only interested in the dependency tree today we instruct the parser to only go as far as the imports statements and we ignore the resulting "not used" errors:</p>
            <pre><code>conf.ParserMode = parser.ImportsOnly
conf.AllowErrors = true
conf.TypeChecker.Error = func(error) {}</code></pre>
            
    <div>
      <h3>Then, the actual logic</h3>
      <a href="#then-the-actual-logic">
        
      </a>
    </div>
    <p>We now have a <code>loader.Program</code> object, which holds references to various <code>loader.PackageInfo</code> objects, which in turn are a combination of package, AST and types information. All you need to perform any kind of complex analysis. Not that we are going to do that today :)</p><p>We'll just replicate <a href="https://github.com/golang/go/blob/87bca88c703c1f14fe8473dc2f07dc521cf2b989/src/cmd/go/pkg.go#L183-L194">the <code>go list</code> logic to recognize stdlib packages</a> and remove the packages passed on the command line from the list:</p>
            <pre><code>initial := make(map[*loader.PackageInfo]bool)
for _, pi := range prog.InitialPackages() {
    initial[pi] = true
}

var packages []*loader.PackageInfo
for _, pi := range prog.AllPackages {
    if initial[pi] {
        continue
    }
    if len(pi.Files) == 0 {
        continue // virtual stdlib package
    }
    filename := prog.Fset.File(pi.Files[0].Pos()).Name()
    if !strings.HasPrefix(filename, build.Default.GOROOT) ||
        !isStandardImportPath(pi.Pkg.Path()) {
        packages = append(packages, pi)
    }
}</code></pre>
            <p>Then we just have to print a warning if any remaining package is not in a <code>/vendor/</code> folder:</p>
            <pre><code>for _, pi := range packages {
    if strings.Index(pi.Pkg.Path(), "/vendor/") == -1 {
        fmt.Println("[!] dependency not vendored:", pi.Pkg.Path())
    }
}</code></pre>
            <p>Done! You can find the tool here: <a href="https://github.com/FiloSottile/vendorcheck">https://github.com/FiloSottile/vendorcheck</a></p>
    <div>
      <h3>Further reading</h3>
      <a href="#further-reading">
        
      </a>
    </div>
    <p><a href="https://github.com/golang/example/tree/master/gotypes#gotypes-the-go-type-checker">This document</a> maintained by Alan Donovan will tell you more than I'll ever know about the static analysis tooling.</p><p>Note that you might be tempted to use <code>go/importer</code> and <code>types.Importer[From]</code> instead of <code>x/go/loader</code>. Don't do that. That doesn't load the source but reads compiled <code>.a</code> files, which <b>can be stale or missing</b>. Static analysis tools that spit out "package not found" for existing packages or, worse, incorrect results because of this are a pet peeve of mine.</p><p><i>If you now feel the urge to write static analysis tools, know that the CloudFlare Go team </i><a href="https://www.cloudflare.com/join-our-team/"><i>is hiring in London, San Francisco and Singapore</i></a><i>!</i></p> ]]></content:encoded>
            <category><![CDATA[Tools]]></category>
            <category><![CDATA[Go]]></category>
            <category><![CDATA[Programming]]></category>
            <guid isPermaLink="false">7f5NBXh02bwJ9WyQmBdtZK</guid>
            <dc:creator>Filippo Valsorda</dc:creator>
        </item>
        <item>
            <title><![CDATA[Go coverage with external tests]]></title>
            <link>https://blog.cloudflare.com/go-coverage-with-external-tests/</link>
            <pubDate>Tue, 19 Jan 2016 18:19:59 GMT</pubDate>
            <description><![CDATA[ The Go test coverage implementation is quite ingenious: when asked to, the Go compiler will preprocess the source so that when each code portion is executed a bit is set in a coverage bitmap. ]]></description>
            <content:encoded><![CDATA[ <p>The Go test coverage implementation is <a href="https://blog.golang.org/cover">quite ingenious</a>: when asked to, the Go compiler will preprocess the source so that when each code portion is executed a bit is set in a coverage bitmap. This is integrated in the <code>go test</code> tool: <code>go test -cover</code> enables it and <code>-coverprofile=</code> allows you to write a profile to then inspect with <code>go tool cover</code>.</p><p>This makes it very easy to get unit test coverage, but <b>there's no simple way to get coverage data for tests that you run against the main version of your program, like end-to-end tests</b>.</p><p>The proper fix would involve adding <code>-cover</code> preprocessing support to <code>go build</code>, and exposing the coverage profile maybe as a <code>runtime/pprof.Profile</code>, but as of Go 1.6 there’s no such support. Here instead is a hack we've been using for a while in the test suite of <a href="/tag/rrdns/">RRDNS</a>, our custom Go DNS server.</p><p>We create a <b>dummy test</b> that executes <code>main()</code>, we put it behind a build tag, compile a binary with <code>go test -c -cover</code> and then run only that test instead of running the regular binary.</p><p>Here's what the <code>rrdns_test.go</code> file looks like:</p>
            <pre><code>// +build testrunmain

package main

import "testing"

func TestRunMain(t *testing.T) {
	main()
}</code></pre>
            <p>We compile the binary like this</p>
            <pre><code>$ go test -coverpkg="rrdns/..." -c -tags testrunmain rrdns</code></pre>
            <p>And then when we want to collect coverage information, we execute this instead of <code>./rrdns</code> (and run our test battery as usual):</p>
            <pre><code>$ ./rrdns.test -test.run "^TestRunMain$" -test.coverprofile=system.out</code></pre>
            <p>You must return from <code>main()</code> cleanly for the profile to be written to disk; in RRDNS we do that by catching SIGINT. You can still use command line arguments and standard input normally, just note that you will get two lines of extra output from the test framework.</p><p>Finally, since you probably also run unit tests, you might want to merge the coverage profiles with <a href="https://github.com/wadey/gocovmerge">gocovmerge</a> (from <a href="https://github.com/golang/go/issues/6909#issuecomment-124185553">issue #6909</a>):</p>
            <pre><code>$ go get github.com/wadey/gocovmerge
$ gocovmerge unit.out system.out &gt; all.out
$ go tool cover -html all.out</code></pre>
            <p>If finding creative ways to test big-scale network services sounds fun, know that <a href="https://www.cloudflare.com/join-our-team/">we are hiring in London, San Francisco and Singapore</a>.</p> ]]></content:encoded>
            <category><![CDATA[RRDNS]]></category>
            <category><![CDATA[DNS]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Programming]]></category>
            <category><![CDATA[Go]]></category>
            <category><![CDATA[Best Practices]]></category>
            <guid isPermaLink="false">285AT2igoBQiRZFlDLnGsZ</guid>
            <dc:creator>Filippo Valsorda</dc:creator>
        </item>
        <item>
            <title><![CDATA[Creative foot-shooting with Go RWMutex]]></title>
            <link>https://blog.cloudflare.com/creative-foot-shooting-with-go-rwmutex/</link>
            <pubDate>Thu, 29 Oct 2015 21:26:36 GMT</pubDate>
            <description><![CDATA[ Hi, I'm Filippo and today I managed to surprise myself! (And not in a good way.)

I'm developing a new module ("filter" as we call them) for RRDNS, CloudFlare's Go DNS server.  ]]></description>
            <content:encoded><![CDATA[ <p>Hi, I'm Filippo and today I managed to surprise myself! (And not in a good way.)</p><p>I'm developing a new module ("filter" as we call them) for <a href="/tag/rrdns/">RRDNS</a>, CloudFlare's Go DNS server. It's a rewrite of the authoritative module, the one that adds the IP addresses to DNS answers.</p><p>It has a table of CloudFlare IPs that looks like this:</p>
            <pre><code>type IPMap struct {
	sync.RWMutex
	M map[string][]net.IP
}</code></pre>
            <p>It's a global filter attribute:</p>
            <pre><code>type V2Filter struct {
	name       string
	IPTable    *IPMap
	// [...]
}</code></pre>
            
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/4YvPr9fRpSXLyFVZ4BL2lr/b11623aba2d210ffac4dcc6a11f43b32/1280px-Mexican_Standoff.jpg" />
            
            </figure><p><a href="https://www.flickr.com/photos/28293006@N05/8144747570">CC-BY-NC-ND image by Martin SoulStealer</a></p><p>The table changes often, so a background goroutine periodically reloads it from our distributed key-value store, acquires the lock (<code>f.IPTable.Lock()</code>), updates it and releases the lock (<code>f.IPTable.Unlock()</code>). This happens every 5 minutes.</p><p>Everything worked in tests, including multiple and concurrent requests.</p><p>Today we deployed to an off-production test machine and everything worked. For a few minutes. Then RRDNS stopped answering queries for the beta domains served by the new code.</p><p>What. _That worked on my laptop_™.</p><p>Here's the IPTable consumer function. You can probably spot the bug.</p>
            <pre><code>func (f *V2Filter) getCFAddr(...) (result []dns.RR) {
	f.IPTable.RLock()
	// [... append IPs from f.IPTable.M to result ...]
	return
}</code></pre>
            <p><code>f.IPTable.RUnlock()</code> is never called. Whoops. But it's an RLock, so multiple <code>getCFAddr</code> calls should work, and only table reloading should break, no? Instead <code>getCFAddr</code> started blocking after a few minutes. To the docs!</p><p><i>To ensure that the lock eventually becomes available, a blocked Lock call excludes new readers from acquiring the lock.</i> <a href="https://golang.org/pkg/sync/#RWMutex.Lock">https://golang.org/pkg/sync/#RWMutex.Lock</a></p><p>So everything worked and RLocks piled up until the table reload function ran, then the pending Lock call caused all following RLock calls to block, breaking RRDNS answer generation.</p><p>In tests the table reload function never ran while answering queries, so <code>getCFAddr</code> kept piling up RLock calls but never blocked.</p><p>No customers were affected because A) the release was still being tested on off-production machines and B) no real customers run on the new code yet. Anyway it was a interesting way to cause a deferred deadlock.</p><p>In closing, there's probably space for a better tooling here. A static analysis tool might output a listing of all Lock/Unlock calls, and a dynamic analysis tool might report still [r]locked Mutex at the end of tests. (Or maybe these tools already exist, in which case let me know!)</p><p><i>Do you want to help (introduce </i><code><i>:)</i></code><i> and) fix bugs in the DNS server answering more than 50 billion queries every day? </i><a href="https://www.cloudflare.com/join-our-team"><i>We are hiring in London, San Francisco and Singapore!</i></a></p> ]]></content:encoded>
            <category><![CDATA[RRDNS]]></category>
            <category><![CDATA[Bugs]]></category>
            <category><![CDATA[DNS]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Programming]]></category>
            <category><![CDATA[Go]]></category>
            <guid isPermaLink="false">5iwarGrOuKY1ZIhN7o19u9</guid>
            <dc:creator>Filippo Valsorda</dc:creator>
        </item>
        <item>
            <title><![CDATA[DNS parser, meet Go fuzzer]]></title>
            <link>https://blog.cloudflare.com/dns-parser-meet-go-fuzzer/</link>
            <pubDate>Thu, 06 Aug 2015 13:40:40 GMT</pubDate>
            <description><![CDATA[ Here at CloudFlare we are heavy users of the github.com/miekg/dns Go DNS library and we make sure to contribute to its development as much as possible. Therefore when Dmitry Vyukov published go-fuzz and started to uncover tens of bugs in the Go standard library, our task was clear. ]]></description>
            <content:encoded><![CDATA[ <p>Here at CloudFlare we are heavy users of the <a href="https://github.com/miekg/dns"><code>github.com/miekg/dns</code></a> Go DNS library and we make sure to contribute to its development as much as possible. Therefore when <a href="https://github.com/dvyukov">Dmitry Vyukov</a> published go-fuzz and started to uncover tens of bugs in the Go standard library, our task was clear.</p>
    <div>
      <h3>Hot Fuzz</h3>
      <a href="#hot-fuzz">
        
      </a>
    </div>
    <p>Fuzzing is the technique of <i>testing software by continuously feeding it inputs that are automatically mutated</i>. For C/C++, the wildly successful <a href="http://lcamtuf.coredump.cx/afl/">afl-fuzz</a> tool by Michał Zalewski uses instrumented source coverage to judge which mutations pushed the program into new paths, <i>eventually hitting many rarely-tested branches</i>.</p><p><a href="https://github.com/dvyukov/go-fuzz"><i>go-fuzz</i></a><i> applies the same technique to Go programs</i>, instrumenting the source by rewriting it (<a href="/go-has-a-debugger-and-its-awesome/">like godebug does</a>). An interesting difference between afl-fuzz and go-fuzz is that the former normally operates on file inputs to unmodified programs, while the latter asks you to <i>write a Go function and passes inputs to that</i>. The former usually forks a new process for each input, the latter keeps calling the function without restarting often.</p><p>There is no strong technical reason for this difference (and indeed afl recently gained the ability to behave like go-fuzz), but it's likely due to the <i>different ecosystems</i> in which they operate: Go programs often expose <i>well-documented, well-behaved APIs</i> which enable the tester to write a good wrapper that doesn't contaminate state across calls. Also, Go programs are often easier to dive into and <i>more predictable</i>, thanks obviously to GC and memory management, but also to the general community repulsion towards unexpected global states and side effects. On the other hand many legacy C code bases are so intractable that the easy and stable file input interface is worth the performance tradeoff.</p><p>Back to our DNS library. RRDNS, our in-house DNS server, uses <code>github.com/miekgs/dns</code> for all its parsing needs, and it has proved to be up to the task. However, it's a bit fragile on the edge cases and has a track record of panicking on malformed packets. Thankfully, this is Go, not <a href="/a-deep-look-at-cve-2015-5477-and-how-cloudflare-virtual-dns-customers-are-protected/">BIND</a> C, and we can afford to <code>recover()</code> panics without worrying about ending up with insane memory states. Here's what we are doing</p>
            <pre><code>func ParseDNSPacketSafely(buf []byte, msg *old.Msg) (err error) {
	defer func() {
		panicked := recover()

		if panicked != nil {
			err = errors.New("ParseError")
		}
	}()

	err = msg.Unpack(buf)

	return
}</code></pre>
            <p>We saw an opportunity to make the library more robust so we wrote this initial simple fuzzing function:</p>
            <pre><code>func Fuzz(rawMsg []byte) int {
    msg := &amp;dns.Msg{}

    if unpackErr := msg.Unpack(rawMsg); unpackErr != nil {
        return 0
    }

    if _, packErr = msg.Pack(); packErr != nil {
        println("failed to pack back a message")
        spew.Dump(msg)
        panic(packErr)
    }

    return 1
}</code></pre>
            <p>To create a corpus of initial inputs we took our stress and regression test suites and used <code>github.com/miekg/pcap</code> to write a file per packet.</p>
            <pre><code>package main

import (
	"crypto/rand"
	"encoding/hex"
	"log"
	"os"
	"strconv"

	"github.com/miekg/pcap"
)

func fatalIfErr(err error) {
	if err != nil {
		log.Fatal(err)
	}
}

func main() {
	handle, err := pcap.OpenOffline(os.Args[1])
	fatalIfErr(err)

	b := make([]byte, 4)
	_, err = rand.Read(b)
	fatalIfErr(err)
	prefix := hex.EncodeToString(b)

	i := 0
	for pkt := handle.Next(); pkt != nil; pkt = handle.Next() {
		pkt.Decode()

		f, err := os.Create("p_" + prefix + "_" + strconv.Itoa(i))
		fatalIfErr(err)
		_, err = f.Write(pkt.Payload)
		fatalIfErr(err)
		fatalIfErr(f.Close())

		i++
	}
}</code></pre>
            
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1JJf54JXErYsYRprnQO8nk/aec6c0359fc16d482d96bea2d37e8c3c/11597106396_a1927f8c71_z.jpg" />
            
            </figure><p><a href="https://creativecommons.org/licenses/by/2.0/">CC BY 2.0</a> <a href="https://www.flickr.com/photos/jdhancock/11597106396/in/photolist-iENcGh-5GuU8g-4G3SzJ-cybzvf-ej9ytf-5PT2gy-2wCHkp-oTNLKN-4T5TVk-pikg74-64fbtb-64fbny-6iPZrk-6WSbWA-gTwR9P-6JEbMJ-uS5Qoe-p3LoLt-8rTRPb-gzJBbc-6u4Ko7-4uXbz8-bX4rtL-6HoBT8-cybFb7-pDtnkY-doskG2-a9tSqx-3NX4E-978gS2-4iW5fs-4VhKK2-7EpKqc-7EtB6Y-7EtAiN-7EpH54-7EpGgX-poTg8-55WEef-qfzP-dt83Gq-naDJvs-aCKhDG-drR492-aTSFS6-aTSEER-aTSC2t-aTSAUR-qqFXz2-ftsXnd">image</a> by <a href="https://www.flickr.com/photos/jdhancock/">JD Hancock</a></p><p>We then compiled our <code>Fuzz</code> function with go-fuzz, and launched the fuzzer on a lab server. The first thing go-fuzz does is minimize the corpus by throwing away packets that trigger the same code paths, then it starts mutating the inputs and passing them to <code>Fuzz()</code> in a loop. The mutations that don't fail (<code>return 1</code>) and <i>expand code coverage</i> are kept and iterated over. When the program panics, a small report (input and output) is saved and the program restarted. If you want to learn more about go-fuzz watch <a href="https://www.youtube.com/watch?v=a9xrxRsIbSU">the author's GopherCon talk</a> or read <a href="https://github.com/dvyukov/go-fuzz">the README</a>.</p><p><i>Crashes, mostly "index out of bounds", started to surface.</i> go-fuzz becomes pretty slow and ineffective when the program crashes often, so while the CPUs burned I started fixing the bugs.</p><p>In some cases I just decided to change some parser patterns, for example <a href="https://github.com/miekg/dns/commit/b5133fead4c0571c20eea405a917778f011dde02">reslicing and using <code>len()</code> instead of keeping offsets</a>. However these can be potentially disrupting changes—I'm far from perfect—so I adapted the Fuzz function to keep an eye on the differences between the old and new, fixed parser, and crash if the new parser started refusing good packets or changed its behavior:</p>
            <pre><code>func Fuzz(rawMsg []byte) int {
    var (
        msg, msgOld = &amp;dns.Msg{}, &amp;old.Msg{}
        buf, bufOld = make([]byte, 100000), make([]byte, 100000)
        res, resOld []byte

        unpackErr, unpackErrOld error
        packErr, packErrOld     error
    )

    unpackErr = msg.Unpack(rawMsg)
    unpackErrOld = ParseDNSPacketSafely(rawMsg, msgOld)

    if unpackErr != nil &amp;&amp; unpackErrOld != nil {
        return 0
    }

    if unpackErr != nil &amp;&amp; unpackErr.Error() == "dns: out of order NSEC block" {
        // 97b0a31 - rewrite NSEC bitmap [un]packing to account for out-of-order
        return 0
    }

    if unpackErr != nil &amp;&amp; unpackErr.Error() == "dns: bad rdlength" {
        // 3157620 - unpackStructValue: drop rdlen, reslice msg instead
        return 0
    }

    if unpackErr != nil &amp;&amp; unpackErr.Error() == "dns: bad address family" {
        // f37c7ea - Reject a bad EDNS0_SUBNET family on unpack (not only on pack)
        return 0
    }

    if unpackErr != nil &amp;&amp; unpackErr.Error() == "dns: bad netmask" {
        // 6d5de0a - EDNS0_SUBNET: refactor netmask handling
        return 0
    }

    if unpackErr != nil &amp;&amp; unpackErrOld == nil {
        println("new code fails to unpack valid packets")
        panic(unpackErr)
    }

    res, packErr = msg.PackBuffer(buf)

    if packErr != nil {
        println("failed to pack back a message")
        spew.Dump(msg)
        panic(packErr)
    }

    if unpackErrOld == nil {

        resOld, packErrOld = msgOld.PackBuffer(bufOld)

        if packErrOld == nil &amp;&amp; !bytes.Equal(res, resOld) {
            println("new code changed behavior of valid packets:")
            println()
            println(hex.Dump(res))
            println(hex.Dump(resOld))
            os.Exit(1)
        }

    }

    return 1
}</code></pre>
            <p>I was pretty happy about the robustness gain, but since we used the <code>ParseDNSPacketSafely</code> wrapper in RRDNS I didn't expect to find security vulnerabilities. I was wrong!</p><p>DNS names are made of labels, usually shown separated by dots. In a space saving effort, labels can be replaced by pointers to other names, so that if we know we encoded <code>example.com</code> at offset 15, <code>www.example.com</code> can be packed as <code>www.</code> + <i>PTR(15)</i>. What we found is <a href="https://github.com/FiloSottile/dns/commit/b364f94">a bug in handling of pointers to empty names</a>: when encountering the end of a name (<code>0x00</code>), if no label were read, <code>"."</code> (the empty name) was returned as a special case. Problem is that this special case was unaware of pointers, and it would instruct the parser to resume reading from the end of the pointed-to empty name instead of the end of the original name.</p><p>For example if the parser encountered at offset 60 a pointer to offset 15, and <code>msg[15] == 0x00</code>, parsing would then resume from offset 16 instead of 61, causing a infinite loop. This is a potential Denial of Service vulnerability.</p>
            <pre><code>A) Parse up to position 60, where a DNS name is found

| ... |  15  |  16  |  17  | ... |  58  |  59  |  60  |  61  |
| ... | 0x00 |      |      | ... |      |      | -&gt;15 |      |

-------------------------------------------------&gt;     

B) Follow the pointer to position 15

| ... |  15  |  16  |  17  | ... |  58  |  59  |  60  |  61  |
| ... | 0x00 |      |      | ... |      |      | -&gt;15 |      |

         ^                                        |
         ------------------------------------------      

C) Return a empty name ".", special case triggers

D) Erroneously resume from position 16 instead of 61

| ... |  15  |  16  |  17  | ... |  58  |  59  |  60  |  61  |
| ... | 0x00 |      |      | ... |      |      | -&gt;15 |      |

                 --------------------------------&gt;   

E) Rinse and repeat</code></pre>
            <p>We sent the fixes privately to the library maintainer while we patched our servers and we <a href="https://github.com/miekg/dns/pull/237">opened a PR</a> once done. (Two bugs were independently found and fixed by Miek while we released our RRDNS updates, as it happens.)</p>
    <div>
      <h3>Not just crashes and hangs</h3>
      <a href="#not-just-crashes-and-hangs">
        
      </a>
    </div>
    <p>Thanks to its flexible fuzzing API, go-fuzz lends itself nicely not only to the mere search of crashing inputs, but <i>can be used to explore all scenarios where edge cases are troublesome</i>.</p><p>Useful applications range from checking output validation by adding crashing assertions to your <code>Fuzz()</code> function, to comparing the two ends of a unpack-pack chain and even comparing the behavior of two different versions or implementations of the same functionality.</p><p>For example, while preparing our <a href="/tag/dnssec/">DNSSEC</a> engine for launch, I faced a weird bug that would happen only on production or under stress tests: <i>NSEC records that were supposed to only have a couple bits set in their types bitmap would sometimes look like this</i></p>
            <pre><code>deleg.filippo.io.  IN  NSEC    3600    \000.deleg.filippo.io. NS WKS HINFO TXT AAAA LOC SRV CERT SSHFP RRSIG NSEC TLSA HIP TYPE60 TYPE61 SPF</code></pre>
            <p>The catch was that our "pack and send" code <i>pools </i><code><i>[]byte</i></code><i> buffers to reduce GC and allocation churn</i>, so buffers passed to <code>dns.msg.PackBuffer(buf []byte)</code> can be "dirty" from previous uses.</p>
            <pre><code>var bufpool = sync.Pool{
    New: func() interface{} {
        return make([]byte, 0, 2048)
    },
}

[...]

    data := bufpool.Get().([]byte)
    defer bufpool.Put(data)

    if data, err = r.Response.PackBuffer(data); err != nil {</code></pre>
            <p>However, <code>buf</code> not being an array of zeroes was not handled by some <code>github.com/miekgs/dns</code> packers, including the NSEC rdata one, that would <i>just OR present bits, without clearing ones that are supposed to be absent</i>.</p>
            <pre><code>case `dns:"nsec"`:
    lastwindow := uint16(0)
    length := uint16(0)
    for j := 0; j &lt; val.Field(i).Len(); j++ {
        t := uint16((fv.Index(j).Uint()))
        window := uint16(t / 256)
        if lastwindow != window {
            off += int(length) + 3
        }
        length = (t - window*256) / 8
        bit := t - (window * 256) - (length * 8)

        msg[off] = byte(window) // window #
        msg[off+1] = byte(length + 1) // octets length

        // Setting the bit value for the type in the right octet
---&gt;    msg[off+2+int(length)] |= byte(1 &lt;&lt; (7 - bit)) 

        lastwindow = window
    }
    off += 2 + int(length)
    off++
}</code></pre>
            <p>The fix was clear and easy: we benchmarked a few different ways to zero a buffer and updated the code like this</p>
            <pre><code>// zeroBuf is a big buffer of zero bytes, used to zero out the buffers passed
// to PackBuffer.
var zeroBuf = make([]byte, 65535)

var bufpool = sync.Pool{
    New: func() interface{} {
        return make([]byte, 0, 2048)
    },
}

[...]

    data := bufpool.Get().([]byte)
    defer bufpool.Put(data)
    copy(data[0:cap(data)], zeroBuf)

    if data, err = r.Response.PackBuffer(data); err != nil {</code></pre>
            <p>Note: <a href="https://github.com/golang/go/commit/f03c9202c43e0abb130669852082117ca50aa9b1">a recent optimization</a> turns zeroing range loops into <code>memclr</code> calls, so once 1.5 lands that will be much faster than <code>copy()</code>.</p><p>But this was a boring fix! Wouldn't it be nicer if we could trust our library to work with any buffer we pass it? Luckily, this is exactly what coverage based fuzzing is good for: <i>making sure all code paths behave in a certain way</i>.</p><p>What I did then is write a <code>Fuzz()</code> function that would first parse a message, and then pack it to two different buffers: one filled with zeroes and one filled with <code>0xff</code>. <i>Any differences between the two results would signal cases where the underlying buffer is leaking into the output.</i></p>
            <pre><code>func Fuzz(rawMsg []byte) int {
    var (
        msg         = &amp;dns.Msg{}
        buf, bufOne = make([]byte, 100000), make([]byte, 100000)
        res, resOne []byte

        unpackErr, packErr error
    )

    if unpackErr = msg.Unpack(rawMsg); unpackErr != nil {
        return 0
    }

    if res, packErr = msg.PackBuffer(buf); packErr != nil {
        return 0
    }

    for i := range res {
        bufOne[i] = 1
    }

    resOne, packErr = msg.PackBuffer(bufOne)
    if packErr != nil {
        println("Pack failed only with a filled buffer")
        panic(packErr)
    }

    if !bytes.Equal(res, resOne) {
        println("buffer bits leaked into the packed message")
        println(hex.Dump(res))
        println(hex.Dump(resOne))
        os.Exit(1)
    }

    return 1
}</code></pre>
            <p>I wish here, too, I could show a PR fixing all the bugs, but go-fuzz did its job even too well and we are still triaging and fixing what it finds.</p><p>Anyway, once the fixes are done and go-fuzz falls silent, we will be free to drop the buffer zeroing step without worry, with no need to audit the whole codebase!</p><p><i>Do you fancy fuzzing the libraries that serve 43 billion queries per day? We are </i><a href="https://www.cloudflare.com/join-our-team"><i>hiring</i></a><i> in London, San Francisco and Singapore!</i></p> ]]></content:encoded>
            <category><![CDATA[RRDNS]]></category>
            <category><![CDATA[DNS]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Tools]]></category>
            <category><![CDATA[Go]]></category>
            <guid isPermaLink="false">7zu5Cq14O6t3QJfjOHY6b7</guid>
            <dc:creator>Filippo Valsorda</dc:creator>
        </item>
        <item>
            <title><![CDATA[A deep look at CVE-2015-5477 and how CloudFlare Virtual DNS customers are protected]]></title>
            <link>https://blog.cloudflare.com/a-deep-look-at-cve-2015-5477-and-how-cloudflare-virtual-dns-customers-are-protected/</link>
            <pubDate>Tue, 04 Aug 2015 10:36:24 GMT</pubDate>
            <description><![CDATA[ Last week ISC published a patch for a critical remotely exploitable vulnerability in the BIND9 DNS server capable of causing a crash with a single packet.

 ]]></description>
            <content:encoded><![CDATA[ <p>Last week ISC <a href="https://kb.isc.org/article/AA-01272">published</a> a patch for a critical remotely exploitable vulnerability in the BIND9 DNS server capable of causing a crash with a single packet.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6WoX0CyBmbXc0MHG4MIfyZ/928cd2fba9294bdddcb0f11bfd5dd6df/8567150970_df04ccbee3_z.jpg" />
            
            </figure><p><a href="https://creativecommons.org/licenses/by/2.0/">CC BY 2.0</a> <a href="https://www.flickr.com/photos/rarvesen/8566054615/in/album-72157633018017313/">image</a> by <a href="https://www.flickr.com/photos/rarvesen/">Ralph Aversen</a></p><p>The public summary tells us that a mistake in handling of queries for the TKEY type causes an assertion to fail, which in turn crashes the server. Since the assertion happens during the query parsing, there is no way to avoid it: it's the first thing that happens on receiving a packet, before any decision is made about what to do with it.</p><p><a href="https://tools.ietf.org/html/rfc2930">TKEY queries</a> are used in the context of <a href="https://tools.ietf.org/html/rfc2845">TSIG</a>, a protocol DNS servers can use to authenticate to each other. They are special in that unlike normal DNS queries they include a “meta” record (of type TKEY) in the EXTRA/ADDITIONAL section of the message.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/Wu8uM3jN01IT813j8A6Hz/905ae629b6a09fd7e1d615a32cad55bf/8567150708_a63cd2cc2b_z.jpg" />
            
            </figure><p><a href="https://creativecommons.org/licenses/by/2.0/">CC BY 2.0</a> <a href="https://www.flickr.com/photos/rarvesen/8566054615/in/album-72157633018017313/">image</a> by <a href="https://www.flickr.com/photos/rarvesen/">Ralph Aversen</a></p><p>Since the exploit packet is now public, I thought we might take a dive and look at the vulnerable code. Let's start by taking a look at the output of a crashing instance:</p>
            <pre><code>03-Aug-2015 16:38:55.509 message.c:2352: REQUIRE(*name == ((void*)0)) failed, back trace
03-Aug-2015 16:38:55.510 #0 0x10001510d in assertion_failed()+0x5d
03-Aug-2015 16:38:55.510 #1 0x1001ee56a in isc_assertion_failed()+0xa
03-Aug-2015 16:38:55.510 #2 0x1000bc31d in dns_message_findname()+0x1ad
03-Aug-2015 16:38:55.510 #3 0x10017279c in dns_tkey_processquery()+0xfc
03-Aug-2015 16:38:55.510 #4 0x100016945 in ns_query_start()+0x695
03-Aug-2015 16:38:55.510 #5 0x100008673 in client_request()+0x18d3
03-Aug-2015 16:38:55.510 #6 0x1002125fe in run()+0x3ce
03-Aug-2015 16:38:55.510 exiting (due to assertion failure)
[1]    37363 abort (core dumped)  ./bin/named/named -f -c named.conf</code></pre>
            <p>This is extremely helpful--after all this is a controlled crash caused by a failed assertion--and tells us what failed and where: <code>message.c:2352</code>. Here's the excerpt.</p>
            <pre><code>// https://source.isc.org/git/bind9.git -- faa3b61 -- lib/dns/message.c

    isc_result_t
    dns_message_findname(dns_message_t *msg, dns_section_t section,
                 dns_name_t *target, dns_rdatatype_t type,
                 dns_rdatatype_t covers, dns_name_t **name,
                 dns_rdataset_t **rdataset)
    {
        dns_name_t *foundname;
        isc_result_t result;
    
        /*
         * XXX These requirements are probably too intensive, especially
         * where things can be NULL, but as they are they ensure that if
         * something is NON-NULL, indicating that the caller expects it
         * to be filled in, that we can in fact fill it in.
         */
        REQUIRE(msg != NULL);
        REQUIRE(VALID_SECTION(section));
        REQUIRE(target != NULL);
        if (name != NULL)
==&gt;         REQUIRE(*name == NULL);

    [...]</code></pre>
            <p>What we have here is a function "<code>dns_message_findname</code>" that searches for an RRset with the given name and type in the given message section. It employs a really common C API: to get the results the caller passes pointers that will be filled in (<code>dns_name_t **name, dns_rdataset_t **rdataset</code>).</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1vYrM6VfswE12W68qh9f5m/1f18e1d7bc0576003ac7c3979586fb03/8566054615_c1c58976a3_z.jpg" />
            
            </figure><p><a href="https://creativecommons.org/licenses/by/2.0/">CC BY 2.0</a> <a href="https://www.flickr.com/photos/rarvesen/8566054615/in/album-72157633018017313/">image</a> by <a href="https://www.flickr.com/photos/rarvesen/">Ralph Aversen</a></p><p>As the big comment ironically acknowledges, it's really strict when validating these pointers: if they don't point to <code>(dns_name_t *)NULL</code> the REQUIRE assertion will fail and the server will crash with no attempt at recovery. Code calling this function must take extra care to pass a pointer to a NULL <code>dns_name_t *</code>, which the function will fill in to return the found name.</p><p>In not-memory safe languages is not uncommon to crash when a programmer assertion is violated, because a program might not be able to cleanup its own memory after something that is not supposed to happen happens.</p><p>So we continue our investigation by climbing up the stack trace to find the illegal call. Next step is <code>dns_tkey_processquery</code>. Here is a simplified excerpt.</p>
            <pre><code>// https://source.isc.org/git/bind9.git -- faa3b61 -- lib/dns/tkey.c

isc_result_t
dns_tkey_processquery(dns_message_t *msg, dns_tkeyctx_t *tctx,
              dns_tsig_keyring_t *ring)
{
    isc_result_t result = ISC_R_SUCCESS;
    dns_name_t *qname, *name;
    dns_rdataset_t *tkeyset;

    /*
     * Interpret the question section.
     */
    result = dns_message_firstname(msg, DNS_SECTION_QUESTION);
    if (result != ISC_R_SUCCESS)
        return (DNS_R_FORMERR);

    qname = NULL;
    dns_message_currentname(msg, DNS_SECTION_QUESTION, &amp;qname);

    /*
     * Look for a TKEY record that matches the question.
     */
    tkeyset = NULL;
    name = NULL;
    result = dns_message_findname(msg, DNS_SECTION_ADDITIONAL, qname,
                      dns_rdatatype_tkey, 0, &amp;name, &amp;tkeyset);
    if (result != ISC_R_SUCCESS) {
        /*
         * Try the answer section, since that's where Win2000
         * puts it.
         */
        if (dns_message_findname(msg, DNS_SECTION_ANSWER, qname,
                     dns_rdatatype_tkey, 0, &amp;name,
                     &amp;tkeyset) != ISC_R_SUCCESS) {
            result = DNS_R_FORMERR;
            tkey_log("dns_tkey_processquery: couldn't find a TKEY "
                 "matching the question");
            goto failure;
        }
    }

[...]</code></pre>
            <p>There are two <code>dns_message_findname</code> calls here. Since we are looking for the one that passes a dirty <code>name</code> we can ignore the first one which is preceded by an explicit <code>name = NULL;</code>.</p><p>The second call is more interesting. The same <code>dns_name_t *name</code> is reused without resetting it to NULL after the previous <code>dns_message_findname</code> call. This must be where the bug is.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/56BHTJoLMp0YSCIWfkeVk8/92cb78c44941ef2e8defe736da554296/8566054163_6f3f9e42da_z.jpg" />
            
            </figure><p><a href="https://creativecommons.org/licenses/by/2.0/">CC BY 2.0</a> <a href="https://www.flickr.com/photos/rarvesen/8566054163/in/album-72157633018017313/">image</a> by <a href="https://www.flickr.com/photos/rarvesen/">Ralph Aversen</a></p><p>Now the question is: when would <code>dns_message_findname</code> set <code>name</code> but not return <code>ISC_R_SUCCESS</code> (so that the <i>if</i> is satisfied)? Let's have a look at the full function body now.</p>
            <pre><code>// https://source.isc.org/git/bind9.git -- faa3b61 -- lib/dns/message.c

isc_result_t
dns_message_findname(dns_message_t *msg, dns_section_t section,
             dns_name_t *target, dns_rdatatype_t type,
             dns_rdatatype_t covers, dns_name_t **name,
             dns_rdataset_t **rdataset)
{
    dns_name_t *foundname;
    isc_result_t result;

    /*
     * XXX These requirements are probably too intensive, especially
     * where things can be NULL, but as they are they ensure that if
     * something is NON-NULL, indicating that the caller expects it
     * to be filled in, that we can in fact fill it in.
     */
    REQUIRE(msg != NULL);
    REQUIRE(VALID_SECTION(section));
    REQUIRE(target != NULL);
    if (name != NULL)
        REQUIRE(*name == NULL);
    if (type == dns_rdatatype_any) {
        REQUIRE(rdataset == NULL);
    } else {
        if (rdataset != NULL)
            REQUIRE(*rdataset == NULL);
    }

    result = findname(&amp;foundname, target,
              &amp;msg-&gt;sections[section]);

    if (result == ISC_R_NOTFOUND)
        return (DNS_R_NXDOMAIN);
    else if (result != ISC_R_SUCCESS)
        return (result);

    if (name != NULL)
        *name = foundname;

    /*
     * And now look for the type.
     */
    if (type == dns_rdatatype_any)
        return (ISC_R_SUCCESS);

    result = dns_message_findtype(foundname, type, covers, rdataset);
    if (result == ISC_R_NOTFOUND)
        return (DNS_R_NXRRSET);

    return (result);
}</code></pre>
            <p>As you can see <code>dns_message_findname</code> uses first <code>findname</code> to match the records with the target name, and then <code>dns_message_findtype</code> to match the target type. In between the two calls... <code>*name = foundname</code>! So if <code>dns_message_findname</code> can find a record with <code>name == qname</code> in <code>DNS_SECTION_ADDITIONAL</code> but then it turns out not to have type <code>dns_rdatatype_tkey</code>, <code>name</code> will be filled in and a failure returned. The second <code>dns_message_findname</code> call will trigger on the dirty <code>name</code> and... boom.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/V4HKcvOVDy2i5aFe2FJAF/d9fc70961af38c66aef05a9800bb8792/8566054013_9202ac3209_z.jpg" />
            
            </figure><p><a href="https://creativecommons.org/licenses/by/2.0/">CC BY 2.0</a> <a href="https://www.flickr.com/photos/rarvesen/8566054163/in/album-72157633018017313/">image</a> by <a href="https://www.flickr.com/photos/rarvesen/">Ralph Aversen</a></p><p>Indeed, the patch just adds <code>name = NULL</code> before the second call. (No, we couldn't have started our investigation from the patch; what's the fun in that!?)</p>
            <pre><code>diff --git a/lib/dns/tkey.c b/lib/dns/tkey.c
index 66210d5..34ad90b 100644
--- a/lib/dns/tkey.c
+++ b/lib/dns/tkey.c
@@ -654,6 +654,7 @@ dns_tkey_processquery(dns_message_t *msg, dns_tkeyctx_t *tctx,
 		 * Try the answer section, since that's where Win2000
 		 * puts it.
 		 */
+		name = NULL;
 		if (dns_message_findname(msg, DNS_SECTION_ANSWER, qname,
 					 dns_rdatatype_tkey, 0, &amp;name,
 					 &amp;tkeyset) != ISC_R_SUCCESS) {</code></pre>
            <p>To recap, here is the bug flow:</p><ul><li><p>a <b>query for type TKEY</b> is received, <code>dns_tkey_processquery</code> is called to parse it</p></li><li><p><code>dns_message_findname</code> is called a first time on the EXTRA section</p></li><li><p><b>a record with the same name as the query is found in the EXTRA section</b>, causing <code>name</code> to be filled, <b>but it's not a TKEY record</b>, causing <code>result != ISC_R_SUCCESS</code></p></li><li><p><code>dns_message_findname</code> is called a second time to look in the ANS section, and it is passed the now dirty <code>name</code> reference</p></li><li><p>the assertion <code>*name != NULL</code> fails, <b>BIND crashes</b></p></li></ul><p>This bug <a href="https://twitter.com/ISCdotORG/status/626132833849905152">was found with</a> the amazing <a href="http://lcamtuf.coredump.cx/afl/"><i>american fuzzy lop</i></a> fuzzer by <a href="https://twitter.com/@jfoote_">@jfoote_</a>. A fuzzer is an automated tool that keeps feeding automatically mutated inputs to a target program until it crashes. You can see how it eventually stumbled upon the TKEY query + non-TKEY EXTRA RR combo and found this bug.</p>
    <div>
      <h3>Virtual DNS customers have always been protected</h3>
      <a href="#virtual-dns-customers-have-always-been-protected">
        
      </a>
    </div>
    <p>Good news! <a href="https://www.cloudflare.com/virtual-dns">CloudFlare Virtual DNS</a> customers have always been protected from this attack, even if they run BIND. Our custom Go DNS server, RRDNS, parses and sanitizes all queries before forwarding them to the origin servers if needed.</p><p>Since Virtual DNS does not support TSIG and TKEY (which are meant to authenticate server-to-server traffic, not recursive lookups) it has no reason to relay EXTRA section records in queries, so it doesn't! That reduces the attack surface and indeed makes it impossible to exploit this vulnerability through Virtual DNS.</p><p>No special rules are in place to protect from this specific vulnerability: RRDNS always validates incoming packets, making sure they look like regular queries, and strips them down to the most simple form possible before relaying them.</p> ]]></content:encoded>
            <category><![CDATA[Vulnerabilities]]></category>
            <category><![CDATA[DNS]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Programming]]></category>
            <guid isPermaLink="false">60sgN5iyY1xyuGiTzlqnxo</guid>
            <dc:creator>Filippo Valsorda</dc:creator>
        </item>
        <item>
            <title><![CDATA[Quick and dirty annotations for Go stack traces]]></title>
            <link>https://blog.cloudflare.com/quick-and-dirty-annotations-for-go-stack-traces/</link>
            <pubDate>Mon, 03 Aug 2015 11:26:04 GMT</pubDate>
            <description><![CDATA[ CloudFlare’s DNS server, RRDNS, is entirely written in Go and typically runs tens of thousands goroutines. Since goroutines are cheap and Go I/O is blocking we run one goroutine per file descriptor we listen on and queue new packets for processing. ]]></description>
            <content:encoded><![CDATA[ <p>CloudFlare’s DNS server, <a href="/tag/rrdns/">RRDNS</a>, is entirely written in Go and typically runs tens of thousands goroutines. Since goroutines are cheap and Go I/O is blocking we run one goroutine per file descriptor we listen on and queue new packets for processing.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/2v4jReMMfDFsc62GnuXpFo/13aafcf00d9d9b425cdf294f62fa7f76/6372385465_014a4e56af_z.jpg" />
            
            </figure><p><a href="https://creativecommons.org/licenses/by-sa/2.0/">CC BY-SA 2.0</a> <a href="https://www.flickr.com/photos/wiredforsound23/6372385465/">image</a> by <a href="https://www.flickr.com/photos/wiredforsound23/">wiredforlego</a></p><p>When there are thousands of goroutines running, debug output quickly becomes difficult to interpret. For example, last week I was tracking down a problem with a file descriptor and wanted to know what its listening goroutine was doing. <i>With 40k stack traces, good luck figuring out which one is having trouble.</i></p><p>Go stack traces include parameter values, but most Go types are (or are implemented as) pointers, so what you will see passed to the goroutine function is just a meaningless memory address.</p><p>We have a couple options to make sense of the addresses: get a heap dump at the same time as the stack trace and cross-reference the pointers, or have a debug endpoint that prints a goroutine/pointer -&gt; IP map. Neither are seamless.</p>
    <div>
      <h3>Underscore to the rescue</h3>
      <a href="#underscore-to-the-rescue">
        
      </a>
    </div>
    <p>However, we know that <i>integers are shown in traces</i>, so what we did is first <i>convert IPv4 addresses to their uint32 representation</i>:</p>
            <pre><code>// addrToUint32 takes a TCPAddr or UDPAddr and converts its IP to a uint32.
// If the IP is v6, 0xffffffff is returned.
func addrToUint32(addr net.Addr) uint32 {
       var ip net.IP
       switch addr := addr.(type) {
       case *net.TCPAddr:
               ip = addr.IP
       case *net.UDPAddr:
               ip = addr.IP
       case *net.IPAddr:
               ip = addr.IP
       }
       if ip == nil {
               return 0
       }
       ipv4 := ip.To4()
       if ipv4 == nil {
               return math.MaxUint32
       }
       return uint32(ipv4[0])&lt;&lt;24 | uint32(ipv4[1])&lt;&lt;16 | uint32(ipv4[2])&lt;&lt;8 | uint32(ipv4[3])
}</code></pre>
            <p>And then <i>pass the IPv4-as-uint32 to the listening goroutine as an </i><code><i>_</i></code><i> parameter</i>. Yes, as a parameter with name <code>_</code>; it's known as the <a href="https://golang.org/ref/spec#Blank_identifier">blank identifier</a> in Go.</p>
            <pre><code>// PacketUDPRead is a goroutine that listens on a specific UDP socket and reads
// in new requests
// The first parameter is the int representation of the listening IP address,
// and it's passed just so it will appear in stack traces
func PacketUDPRead(_ uint32, conn *net.UDPConn, ...) { ... }

go PacketUDPRead(addrToUint32(conn.LocalAddr()), conn, ...)</code></pre>
            <p>Now when we get a stack trace, we can just look at the first bytes, convert them back to dotted notation, and know on what IP the goroutine was listening.</p>
            <pre><code>goroutine 42 [IO wait]:
	[...]
	/.../request.go:195 +0x5d
rrdns/core.PacketUDPRead(0xc27f000001, 0x2b6328113ad8, 0xc20801ecc0, 0xc208044308, 0xc208e99280, 0xc208ad8180, 0x12a05f200)
	/.../server.go:119 +0x35a
created by rrdns/core.PacketIO
	/.../server.go:230 +0x8be</code></pre>
            <p><code>0xc27f000001</code> -&gt; remove alignment byte -&gt; <code>0x7f000001</code> -&gt; <code>127.0.0.1</code></p><p>Obviously you can do the same with any piece of information you can represent as an <code>int</code>.</p><p><i>Are you interested in taming the goroutines that run the web? We're </i><a href="https://www.cloudflare.com/join-our-team"><i>hiring</i></a><i> in London, San Francisco and Singapore!</i></p> ]]></content:encoded>
            <category><![CDATA[RRDNS]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[DNS]]></category>
            <category><![CDATA[Go]]></category>
            <category><![CDATA[Programming]]></category>
            <guid isPermaLink="false">2Eu6vhCtFVL0I002N3mpV2</guid>
            <dc:creator>Filippo Valsorda</dc:creator>
        </item>
        <item>
            <title><![CDATA[Setting Go variables from the outside]]></title>
            <link>https://blog.cloudflare.com/setting-go-variables-at-compile-time/</link>
            <pubDate>Wed, 01 Jul 2015 13:26:37 GMT</pubDate>
            <description><![CDATA[ CloudFlare's DNS server, RRDNS, is written in Go and the DNS team used to generate a file called version.go in our Makefile. version.go looked something like this. ]]></description>
            <content:encoded><![CDATA[ <p>CloudFlare's DNS server, <a href="/tag/rrdns/">RRDNS</a>, is written in Go and the DNS team used to generate a file called <code>version.go</code> in our Makefile. <code>version.go</code> looked something like this:</p>
            <pre><code>// THIS FILE IS AUTOGENERATED BY THE MAKEFILE. DO NOT EDIT.

// +build	make

package version

var (
	Version   = "2015.6.2-6-gfd7e2d1-dev"
	BuildTime = "2015-06-16-0431 UTC"
)</code></pre>
            <p>and was used to embed version information in RRDNS. It was built inside the Makefile using <code>sed</code> and <code>git describe</code> from a template file. It worked, but was pretty ugly.</p><p>Today we noticed that another Go team at CloudFlare, the <a href="/tag/data/">Data</a> team, had <b>a much smarter way to bake version numbers into binaries using the </b><code><b>-X</b></code><b> linker option</b>.</p><p>The <code>-X</code> <a href="https://golang.org/cmd/ld/">Go linker option</a>, which you can set with <code>-ldflags</code>, sets the value of a string variable in the Go program being linked. You use it like this: <code>-X main.version 1.0.0</code>.</p><p>A simple example: let's say you have this source file saved as <code>hello.go</code>.</p>
            <pre><code>package main

import "fmt"

var who = "World"

func main() {
    fmt.Printf("Hello, %s.\n", who)
}</code></pre>
            <p>Then you can use <code>go run</code> (or other build commands like <code>go build</code> or <code>go install</code>) with the <code>-ldflags</code> option to modify the value of the <code>who</code> variable:</p>
            <pre><code>$ go run hello.go
Hello, World.
$ go run -ldflags="-X main.who CloudFlare" hello.go
Hello, CloudFlare.</code></pre>
            <p>The format is <code>importpath.name string</code>, so it's possible to set the value of any string anywhere in the Go program, not just in main. Note that from Go 1.5 the syntax has <a href="https://tip.golang.org/cmd/link/">changed</a> to <code>importpath.name=string</code>. The old style is still supported but the linker will complain.</p><p>I was worried this would not work with external linking (<a href="https://docs.google.com/document/d/1nr-TQHw_er6GOQRsF6T43GGhFDelrAP0NqSS_00RgZQ/view">for example when using cgo</a>) but as we can see with <code>-ldflags="-linkmode=external -v"</code> the Go linker runs first anyway and takes care of our <code>-X</code>.</p>
            <pre><code>$ go build -x -ldflags="-X main.who CloudFlare -linkmode=external -v" hello.go
WORK=/var/folders/v8/xdj2snz51sg2m2bnpmwl_91c0000gn/T/go-build149644699
mkdir -p $WORK/command-line-arguments/_obj/
cd /Users/filippo/tmp/X
/usr/local/Cellar/go/1.4.2/libexec/pkg/tool/darwin_amd64/6g -o $WORK/command-line-arguments.a -trimpath $WORK -p command-line-arguments -complete -D _/Users/filippo/tmp/X -I $WORK -pack ./hello.go
cd .
/usr/local/Cellar/go/1.4.2/libexec/pkg/tool/darwin_amd64/6l -o hello -L $WORK -X main.hi hi -linkmode=external -v -extld=clang $WORK/command-line-arguments.a
# command-line-arguments
HEADER = -H1 -T0x2000 -D0x0 -R0x1000
searching for runtime.a in $WORK/runtime.a
searching for runtime.a in /usr/local/Cellar/go/1.4.2/libexec/pkg/darwin_amd64/runtime.a
 0.06 deadcode
 0.07 pclntab=284969 bytes, funcdata total 49800 bytes
 0.07 dodata
 0.08 symsize = 0
 0.08 symsize = 0
 0.08 reloc
 0.09 reloc
 0.09 asmb
 0.09 codeblk
 0.09 datblk
 0.09 dwarf
 0.09 sym
 0.09 headr
host link: clang -m64 -gdwarf-2 -Wl,-no_pie,-pagezero_size,4000000 -o hello -Qunused-arguments /var/folders/v8/xdj2snz51sg2m2bnpmwl_91c0000gn/T//go-link-mFNNCD/000000.o /var/folders/v8/xdj2snz51sg2m2bnpmwl_91c0000gn/T//go-link-mFNNCD/000001.o /var/folders/v8/xdj2snz51sg2m2bnpmwl_91c0000gn/T//go-link-mFNNCD/go.o -g -O2 -g -O2 -lpthread
 0.17 cpu time
33619 symbols
64 sizeof adr
216 sizeof prog
23412 liveness data</code></pre>
            <p><i>Do you want to work next to Go developers that can always make you learn a new trick? </i><a href="https://www.cloudflare.com/join-our-team"><i>We are hiring in London, San Francisco and Singapore</i></a><i>.</i></p> ]]></content:encoded>
            <category><![CDATA[RRDNS]]></category>
            <category><![CDATA[Programming]]></category>
            <category><![CDATA[Go]]></category>
            <category><![CDATA[Reliability]]></category>
            <guid isPermaLink="false">7JoU9nV9zNjoGpJYgiusvV</guid>
            <dc:creator>Filippo Valsorda</dc:creator>
        </item>
        <item>
            <title><![CDATA[Go has a debugger—and it's awesome!]]></title>
            <link>https://blog.cloudflare.com/go-has-a-debugger-and-its-awesome/</link>
            <pubDate>Thu, 18 Jun 2015 11:14:00 GMT</pubDate>
            <description><![CDATA[ Something that often, uh... bugs Go developers is the lack of a proper debugger. Builds are ridiculously fast and easy, but sometimes it would be nice to just set a breakpoint and step through that endless if chain or print a bunch of values without recompiling ten times. ]]></description>
            <content:encoded><![CDATA[ <p>Something that often, uh... <i>bugs</i><a href="#fn1">[1]</a> Go developers is the <b>lack of a proper debugger</b>. Sure, builds are ridiculously fast and easy, and <code>println(hex.Dump(b))</code> is your friend, but sometimes it would be nice to just set a breakpoint and step through that endless <code>if</code> chain or print a bunch of values without recompiling ten times.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/1PlA0CeyPc2zJ6Ai6u9f9u/77f31929ad59993a59bb24367a46d852/12294903084_3a3d128ae7_z.jpg" />
            
            </figure><p><a href="https://creativecommons.org/licenses/by/2.0/">CC BY 2.0</a> <a href="https://www.flickr.com/photos/62766743@N07/12294903084/in/photolist-jJsAkE-hiHrhB-9TNjzG-9TMKnB-9TKuyt-9TKuQx-4rHRku-9TNj1L-dCD4Ay-bbk7in-ngEQwy-q577yv-qmsFPs-qXFbRy-dCMyqk-rmqu1H-tncWw9-fzkCLf-54MZxq-9ZCivM-fdC6b-5jvVQ7-q4YkxA-2vVkpu-aY6pnx-9TNiVC-j8TKCC-9TNji3-dKjVwD-eRrMtP-dVJA3D-bwjW2u-ohnZh9-iRdXBy-dWXXKe-fdT8VT-ePmAs-ecdQqy-ieu7sA-iFi5z-j6m1Qs-ncgQ2q-7W3hJi-r17FpD-ekipUs-jYbRdy-ckWNBh-gT4VL-9TNjvC-9TNjpL">image</a> by <a href="https://www.flickr.com/photos/62766743@N07/">Carl Milner</a></p><p>You <i>could</i> try to use some dirty gdb hacks that will work if you built your binary with a certain linker and ran it on some architectures when the moon was in a waxing crescent phase, but let's be honest, it isn't an enjoyable experience.</p><p>Well, worry no more! <a href="https://github.com/mailgun/godebug">godebug</a> is here!</p><p><b>godebug is an awesome cross-platform debugger</b> created by the Mailgun team. You can read <a href="http://blog.mailgun.com/introducing-a-new-cross-platform-debugger-for-go/">their introduction</a> for some under-the-hood details, but here's the cool bit: instead of wrestling with half a dozen different ptrace interfaces that would not be portable, <b>godebug rewrites your source code</b> and injects function calls like <code>godebug.Line</code> on every line, <code>godebug.Declare</code> at every variable declaration, and <code>godebug.SetTrace</code> for breakpoints (i.e. wherever you type <code>_ = "breakpoint"</code>).</p><p>I find this solution brilliant. What you get out of it is a (possibly cross-compiled) debug-enabled binary that you can drop on a staging server just like you would with a regular binary. When a breakpoint is reached, the program will stop inline and wait for you on stdin. <b>It's the single-binary, zero-dependencies philosophy of Go that we love applied to debugging.</b> Builds everywhere, runs everywhere, with no need for tools or permissions on the server. It even compiles to JavaScript with gopherjs (check out the Mailgun post above—show-offs ;) ).</p><p>You might ask, "But does it get a decent runtime speed or work with big applications?" Well, the other day I was seeing RRDNS—our in-house Go DNS server—hit a weird branch, so I placed a breakpoint a couple lines above the <i>if</i> in question, <b>recompiled the whole of RRDNS with godebug instrumentation</b>, dropped the binary on a staging server, and replayed some DNS traffic.</p>
            <pre><code>filippo@staging:~$ ./rrdns -config config.json
-&gt; _ = "breakpoint"
(godebug) l

    q := r.Query.Question[0]

--&gt; _ = "breakpoint"

    if !isQtypeSupported(q.Qtype) {
        return
(godebug) n
-&gt; if !isQtypeSupported(q.Qtype) {
(godebug) q
dns.Question{Name:"filippo.io.", Qtype:0x1, Qclass:0x1}
(godebug) c</code></pre>
            <p>Boom. The request and the debug log paused (make sure to terminate any timeout you have in your tools), waiting for me to step through the code.</p><p>Sold yet? Here's how you use it: simply run <code>godebug {build|run|test}</code> instead of <code>go {build|run|test}</code>. <a href="https://github.com/mailgun/godebug/pull/32/commits">We adapted godebug</a> to resemble the go tool as much as possible. Remember to use <code>-instrument</code> if you want to be able to step into packages that are not <i>main</i>.</p><p>For example, here is part of the RRDNS Makefile:</p>
            <pre><code>bin/rrdns:
ifdef GODEBUG
	GOPATH="${PWD}" go install github.com/mailgun/godebug
	GOPATH="${PWD}" ./bin/godebug build -instrument "${GODEBUG}" -o bin/rrdns rrdns
else
	GOPATH="${PWD}" go install rrdns
endif

test:
ifdef GODEBUG
	GOPATH="${PWD}" go install github.com/mailgun/godebug
	GOPATH="${PWD}" ./bin/godebug test -instrument "${GODEBUG}" rrdns/...
else
	GOPATH="${PWD}" go test rrdns/...
endif</code></pre>
            <p>Debugging is just a <code>make bin/rrdns GODEBUG=rrdns/...</code> away.</p><p>This tool is still young, but in my experience, perfectly functional. The UX could use some love if you can spare some time (as you can see above it's pretty spartan), but it should be easy to build on what's there already.</p>
    <div>
      <h2>About source rewriting</h2>
      <a href="#about-source-rewriting">
        
      </a>
    </div>
    <p>Before closing, I'd like to say a few words about the technique of source rewriting in general. It powers many different Go tools, like <a href="https://blog.golang.org/cover">test coverage</a>, <a href="https://github.com/dvyukov/go-fuzz">fuzzing</a> and, indeed, debugging. It's made possible primarily by Go’s blazing-fast compiles, and it enables amazing cross-platform tools to be built easily.</p><p>However, since it's such a handy and powerful pattern, I feel like <b>there should be a standard way to apply it in the context of the build process</b>. After all, all the source rewriting tools need to implement a subset of the following features:</p><ul><li><p>Wrap the main function</p></li><li><p>Conditionally rewrite source files</p></li><li><p>Keep global state</p></li></ul><p>Why should every tool have to reinvent all the boilerplate to copy the source files, rewrite the source, make sure stale objects are not used, build the right packages, run the right tests, and interpret the CLI..? Basically, all of <a href="https://github.com/mailgun/godebug/blob/f8742f647adb8ee17a1435de3b1929d36df590c8/cmd.go">godebug/cmd.go</a>. And what about <a href="http://getgb.io/">gb</a>, for example?</p><p>I think we need a framework for Go source code rewriting tools. (Spoiler, spoiler, ...)</p><p><i>If you’re interested in working on Go servers at scale and developing tools to do it better, remember </i><a href="https://www.cloudflare.com/join-our-team"><i>we’re hiring in London, San Francisco, and Singapore</i></a><i>!</i></p><hr /><hr /><ol><li><p>I'm sorry. <a href="#fnref1">↩︎</a></p></li></ol> ]]></content:encoded>
            <category><![CDATA[RRDNS]]></category>
            <category><![CDATA[Tools]]></category>
            <category><![CDATA[DNS]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Programming]]></category>
            <category><![CDATA[Go]]></category>
            <category><![CDATA[Developers]]></category>
            <guid isPermaLink="false">7rlszh5ZEwkE3JfCjkJkZv</guid>
            <dc:creator>Filippo Valsorda</dc:creator>
        </item>
        <item>
            <title><![CDATA[Logjam: the latest TLS vulnerability explained]]></title>
            <link>https://blog.cloudflare.com/logjam-the-latest-tls-vulnerability-explained/</link>
            <pubDate>Wed, 20 May 2015 23:52:53 GMT</pubDate>
            <description><![CDATA[ Yesterday, a group from INRIA, Microsoft Research, Johns Hopkins, the University of Michigan, and the University of Pennsylvania published a deep analysis of the Diffie-Hellman algorithm as used in TLS and other protocols.  ]]></description>
            <content:encoded><![CDATA[ <p></p><p><i>Image: "Logjam" </i><a href="https://twitter.com/0xabad1dea/status/600874766527012865"><i>as interpreted by @0xabad1dea</i></a></p><p>Yesterday, a group from INRIA, Microsoft Research, Johns Hopkins, the University of Michigan, and the University of Pennsylvania <a href="https://weakdh.org/imperfect-forward-secrecy.pdf">published</a> a deep analysis of the Diffie-Hellman algorithm as used in TLS and other protocols. This analysis included a novel downgrade attack against the TLS protocol itself called <a href="https://weakdh.org/"><b>Logjam</b></a>, which exploits EXPORT cryptography (just like <a href="/cloudflare-sites-are-protected-from-freak/">FREAK</a>).</p><p>First, let me start by saying that <b>CloudFlare customers are not and were never affected</b>. We don’t support non-EC Diffie-Hellman ciphersuites on either the client or origin side. We also won't touch EXPORT-grade cryptography with a 20ft stick.</p><p>But why are CloudFlare customers safe, and how does Logjam work anyway?</p>
    <div>
      <h3>Diffie-Hellman and TLS</h3>
      <a href="#diffie-hellman-and-tls">
        
      </a>
    </div>
    <p><i>This is a detailed technical introduction to how DH works and how it’s used in TLS—if you already know this and want to read about the attack, skip to “Enter export crypto, enter Logjam” below. If, instead, you are not interested in the nuts and bolts and want to know who’s at risk, skip to “So, what’s affected?”</i></p><p>To start a TLS connection, the two sides—client (the browser) and server (CloudFlare)—need to agree securely on a secret key. This process is called <b>Key Exchange</b> and it happens during the TLS Handshake: the exchange of messages that take place before encrypted data can be transmitted.</p><p>There is a detailed description of the TLS handshake in the first part of <a href="/keyless-ssl-the-nitty-gritty-technical-details/">this previous blog post by Nick Sullivan</a>. In the following, I’ll only discuss the ideas you’ll need to understand the attack at hand.</p><p>There are many types of Key Exchanges: static RSA, Diffie-Hellman (DHE cipher suites), Elliptic Curve Diffie-Hellman (ECDHE cipher suites), and some less used methods.</p><p>An important property of DHE and ECDHE key exchanges is that they provide <a href="/staying-on-top-of-tls-attacks/#forwardsecrecy">Forward Secrecy</a>. That is, even if the server key is compromised at some point, it can’t be used to decrypt past connections. It’s important to protect the information exchanged from future breakthroughs, and we’re proud to say that 94% of CloudFlare connections provide it.</p><p>This research—and this attack—applies to the normal non-EC <b>Diffie-Hellman key exchange (DHE)</b>. This is how it works at a high level (don’t worry, I’ll explain each part in more detail below):</p><ol><li><p>The client advertises support for DHE cipher suites when opening a connection (in what is called a Client Hello message)</p></li><li><p>The server picks the parameters and performs its half of the DH computation using those parameters</p></li><li><p>The server signs parameters and its DH share with its long-term certificate and sends the whole thing to the client</p></li><li><p>The client checks the signature, uses the parameters to perform its half of the computation and sends the result to the server</p></li><li><p>Both parts put everything together and derive a shared secret, which they will use as the key to secure the connection</p></li></ol><p>(For a more in depth analysis of each step see the link to Nick Sullivan’s blog post above, <a href="/keyless-ssl-the-nitty-gritty-technical-details/#ephemeraldiffiehellmanhandshake">“Ephemeral Diffie-Hellman Handshake” section</a>.)</p>
            <figure>
            <a href="http://staging.blog.mrk.cfdata.org/content/images/2015/05/ssl_handshake_diffie_hellman.jpg">
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7cBRR2q8hqa1xKfoWLSFcl/73ac27fa495f52d5983d81b32d814fb4/ssl_handshake_diffie_hellman.jpg" />
            </a>
            </figure><p>Let’s explain some of the terms that just passed by your screen. “The client” is the browser and “the server” is the website (or CloudFlare’s edge serving the website).</p><p>“The parameters” (or <i>group</i>) are some big numbers that are used as base for the DH computations. <b>They can be, and often are, fixed. The security of the final secret depends on the size of these parameters.</b> This research deemed 512 and 768 bits to be weak, 1024 bits to be breakable by really powerful attackers like governments, and 2048 bits to be a safe size.</p><p>The certificate contains a public key and is what you (or CloudFlare for you) get issued from a CA for your website. The client makes sure it’s issued by a CA it trusts and that it’s valid for the visited website. The server uses the corresponding private key to cryptographically sign its share of the DH key exchange so that the client can be sure it’s agreeing on a connection key with the real owner of the website, not a MitM.</p><p>Finally, the DH computation: there’s a <a href="https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange#Description">beautiful explanation of this on Wikipedia which uses <i>paint</i></a>. The tl;dr is:</p><ol><li><p>The server picks a secret ‘a’</p></li><li><p>Then it computes—using some parameters as a base—a value ‘A’ from it and sends that to the client (not ‘a’!)</p></li><li><p>The client picks a secret ‘b’, takes the parameters from the server and likewise it computes a value ‘B’ that sends to the server</p></li><li><p>Both parts put together ‘a’ + ‘B’ or ‘b’ + ‘A’ to derive a shared, identical secret - which is impossible to compute from ‘A’ + ‘B’ which are the only things that travelled on the wire</p></li></ol><p>The security of all this depends on the strength/size of the parameters.</p>
    <div>
      <h3>Enter export crypto, enter Logjam</h3>
      <a href="#enter-export-crypto-enter-logjam">
        
      </a>
    </div>
    <p>So far, so good. Diffie-Hellman is nice, it provides Forward Secrecy, it’s secure if the parameters are big enough, and the parameters are picked and signed by the server. So what’s the problem?</p><p>Enter “export cryptography”! <b>Export cryptography</b> is a relic of the 90’s US restrictions on cryptography export. In order to support SSL in countries to where the U.S. had disallowed exporting "strong cryptography", many implementations support weakened modes called EXPORT modes.</p><p>We’ve already seen an attack that succeeded because connections could be forced to use these modes even if they wouldn’t want to, this is what happened with the FREAK vulnerability. It’s telling that 20 years after these modes became useless we are still dealing with the outcome of the added complexity.</p><p>How it works with Diffie-Hellman is that the client requests a <code>DHE_EXPORT</code> ciphersuite instead of the corresponding <code>DHE</code> one. Seeing that, the server (if it supports DHE_EXPORT) <i>picks small, breakable 512-bits parameters</i> for the exchange, and carries on with a regular DHE key exchange. <b>The server doesn’t signal back securely to the client that it picked such small DH parameters because of the EXPORT ciphersuite</b>.</p><p>This is the protocol flaw at the heart of Logjam “downgrade attack”:</p><ul><li><p>A MitM attacker intercepts a client connection and replaces all the accepted ciphersuites with only the DHE_EXPORT ones</p></li><li><p>The server picks weak 512-bits parameters, does its half of the computation, and signs the parameters with the certificate’s private key. <b>Neither the Client Hello, the client ciphersuites, nor the chosen ciphersuite are signed by the server!</b></p></li><li><p>The client is led to believe that the server picked a DHE Key Exchange and just willingly decided for small parameters. From its point of view, it has have no way to know that the server was tricked by the MitM into doing so!</p></li><li><p>The attacker would then break one of the two weak DH shares, recover the connection key, and proceed with the TLS connection with the client</p></li></ul>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/7bbVK9rknxBcXalLiCGfyk/952480fdee17b7a628e076139ef9b002/https-weakdh-org-imperfect-forward-secrecy-pdf-2015-05-21-01-19-48.png" />
            
            </figure><p><a href="https://weakdh.org/imperfect-forward-secrecy.pdf"><i>Imperfect Forward Secrecy: How Diffie-Hellman Fails in Practice</i></a><i> - Figure 2</i></p><p>The client has no other way to protect itself besides drawing a line in the sand about how weak the DHE parameters can be (e.g. at least 1024 bits) and refuse to connect to servers that want to pick smaller ones. This is what all modern browsers are now doing, but it wasn’t done before because it causes breakage, and it was believed that there was no way to trick a server into choosing such weak parameters if it wouldn’t normally.</p><p>The servers can protect themselves by refusing EXPORT ciphersuites and never signing small parameters.</p>
    <div>
      <h3>About parameters size and reuse</h3>
      <a href="#about-parameters-size-and-reuse">
        
      </a>
    </div>
    <p>But how small is too small for DH parameters? The Logjam paper analyzes this in depth also. The first thing to understand is that <b>parameters can be and often are reused</b>. 17.9% of the Top 1 Million Alexa Domains used the same 1024-bit parameters.</p><p><b>An attacker can perform the bulk of the computation having only the parameters</b>, and then break any DH exchange that uses them in minutes. So when many sites (or VPN servers, etc.) share the same parameters, the investment of time needed to “break” the parameters makes much more sense since it would then allow the attacker to break many connections with little extra effort.</p><p>The research team performed the precomputation on the most common 512-bit (EXPORT) parameters to demonstrate the impact of Logjam, but they express concerns that real, more powerful attackers might do the same with the common normal-DHE 1024-bit parameters.</p><p>Finally, in their Internet-wide scan they discovered that many servers will <b>provide vulnerable 512-bit parameters even for non-EXPORT DHE</b>, in order to support older TLS implementations (for example, old Java versions).</p>
    <div>
      <h3>So, what’s affected?</h3>
      <a href="#so-whats-affected">
        
      </a>
    </div>
    <p><b>A client/browser is affected if it accepts small DHE parameters as part of any connection</b>, since it has no way to know that it’s being tricked into a weak EXPORT-level connection. Most major browsers at the time of this writing are vulnerable but <a href="https://groups.google.com/a/chromium.org/forum/#!msg/security-dev/WyGIpevBV1s">are moving to restrict the size of DH parameters to 1024 bit</a>. You can check yours visiting <a href="https://weakdh.org">weakdh.org</a>.</p><p><b>A server/website is vulnerable if it supports the DHE_EXPORT ciphersuites or if it uses small parameters for DHE.</b> You can find a test and instructions on how to fix this at <a href="https://weakdh.org/sysadmin.html">https://weakdh.org/sysadmin.html</a>. 8.4% of Alexa Top Million HTTPS websites were initially vulnerable (with 82% and 10% of them using the same two parameters sets, making precomputation more viable). CloudFlare servers don’t accept either DHE_EXPORT or DHE. We offer ECDHE instead.</p><p>Some interesting related statistics: <b>94% of the TLS connections to CloudFlare customer sites uses ECDHE</b> (more precisely 90% of them being <code>ECDHE-RSA-AES</code> of some sort and 10% <a href="/do-the-chacha-better-mobile-performance-with-cryptography/"><code>ECDHE-RSA-CHACHA20-POLY1305</code></a>) and provides Forward Secrecy. The rest use static RSA (5.5% with AES, 0.6% with 3DES).</p><p><i>Both the client and the server need to be vulnerable</i> in order for the attack to succeed because the server must accept to sign small DHE_EXPORT parameters, and the client must accept them as valid DHE parameters.</p><p>A closing note: <b>events like this are ultimately a good thing for the security industry and the web at large</b> since they mean that skilled people are looking at what we rely on to secure our connections and fix its flaws. They also put a spotlight on how the added complexity of supporting reduced-strength crypto and older devices endangers and adds difficulty to all of our security efforts.</p><p>If you’ve read to here, found it interesting, and would like to work on things like this, remember that we’re <a href="https://www.cloudflare.com/join-our-team">hiring in London and San Francisco</a>!</p> ]]></content:encoded>
            <category><![CDATA[TLS]]></category>
            <category><![CDATA[Vulnerabilities]]></category>
            <category><![CDATA[Security]]></category>
            <guid isPermaLink="false">0ugl2V2Lvy7zDaehj24Rn</guid>
            <dc:creator>Filippo Valsorda</dc:creator>
        </item>
        <item>
            <title><![CDATA[Help us test our DNSSEC implementation]]></title>
            <link>https://blog.cloudflare.com/help-us-test-our-dnssec-implementation/</link>
            <pubDate>Thu, 29 Jan 2015 13:03:08 GMT</pubDate>
            <description><![CDATA[ Today is a big day for CloudFlare! We are publishing our first two DNSSEC signed zones for the community to analyze and give feedback on. ]]></description>
            <content:encoded><![CDATA[ <p><i>See our previous post for an </i><a href="/dnssec-an-introduction/"><i>introduction to DNSSEC</i></a><i>.</i></p><p>Today is a big day for Cloudflare! We are publishing our first DNSSEC signed zone for the community to analyze and give feedback on:</p><ul><li><p><a href="http://www.cloudflare.com">www.cloudflare.com</a>- a fully signed zone managed by Cloudflare</p></li></ul><p>We've been testing our implementation internally for some time with great results, so we now want to know from outside users how it’s working!</p><p>Here’s an example of what you should see if you pull the records of, for example, <a href="http://www.cloudflare.com">www.cloudflare.com</a></p>
            <pre><code>DiG 9.11.18 &lt;&lt;&gt;&gt; www.cloudflare.com A +dnssec
;; global options: +cmd
;; Got answer:
;; -&gt;&gt;HEADER&lt;&lt;- opcode: QUERY, status: NOERROR, id: 57987
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;www.cloudflare.com.		IN	A

;; ANSWER SECTION:
www.cloudflare.com.	294	IN	RRSIG	A 13 3 300 20210114001214 20210111221214 34505 www.cloudflare.com. QZrCZlAC29e5RYjF+Xt9l02bWYhPE9so5EZZHO07oAd1m6x4Ghbt873O t7dipnScuJcdu2zPpvFGAu5f+dtrNg==
www.cloudflare.com.	294	IN	A	104.16.123.96
www.cloudflare.com.	294	IN	A	104.16.124.96

;; Query time: 25 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Tue Jan 12 15:12:17 PST 2021
;; MSG SIZE  rcvd: 193
</code></pre>
            <p>This is a big step towards our goal of doing with DNSSEC what we did with TLS: making it <b>easy and widespread</b>. We’re working on that and will get there soon.</p><p>DNSSEC presents <a href="/dnssec-complexities-and-considerations/">many complexities</a> that we are addressing doing DNSSEC in a <b>modern</b> way: for example by <b>signing on the fly</b> we can prevent NSEC records from <a href="/dnssec-complexities-and-considerations/#zonecontentexposure">revealing all zone’s subdomains</a>; by using <b>ECDSA</b> we make DNS answers smaller and reduce the risk of <a href="/dnssec-complexities-and-considerations/#reflectionamplificationthreat">reflection attacks</a>; and finally by providing a fully <b>managed</b> solution we take away all the complexity from you.</p>
            <figure>
            
            <img src="https://cf-assets.www.cloudflare.com/zkvhlag99gkb/6b4SxPn3YcA4X85aRfcXox/bda50db841b3f37c77d57124ad47979e/www-cloudflare-dnssec-auth-com---DNSViz-2015-01-29-01-47-50.png" />
            
            </figure><p><i>A visualization of the signatures on our domain. Source: </i><a href="http://dnsviz.net/d/www.cloudflare-dnssec-auth.com/dnssec/"><i>DNSViz</i></a></p><p>So let us know how those two domains load and validate for you. We’ll make sure to get you some stickers if you find some obscure bug!</p><p><i>UPDATE: The beta is full, thanks for all who are participating.</i></p><p>P.S. If you are a DNSSEC enthusiast and you want to be part of the public beta, just send an email to <a>dnssec-beta@cloudflare.com</a> with the name of your website and the answer to this question - first ten people get in:</p><blockquote><p>~~ What is the DNSSEC algorithm number for ECDSAP256SHA256? ~~</p></blockquote> ]]></content:encoded>
            <category><![CDATA[DNSSEC]]></category>
            <category><![CDATA[Beta]]></category>
            <category><![CDATA[Reliability]]></category>
            <category><![CDATA[Programming]]></category>
            <guid isPermaLink="false">7sV3lRq0NXK6c6RhZdZYz9</guid>
            <dc:creator>Filippo Valsorda</dc:creator>
        </item>
    </channel>
</rss>