Subscribe to receive notifications of new posts:

What's inside net/http? Late binding in the Go standard library

2015-12-21

2 min read

It's well known that we're heavy users of the Go programming language at CloudFlare. Our work often involves delving into the standard library source code to understand internal code paths, error handling and performance characteristics.

Recently, I looked at how the standard library's built-in HTTP client handles connections to remote servers in order to provide minimal roundtrip latency.

Athletics track

CC By 2.0 Image by Dean Hochman

Connection pooling

A common pattern that aims to avoid connection setup costs (such as the TCP handshake and TLS setup) and confer control over the number of concurrently established connections is to pool them. net/http maintains a pool of connections to each remote host which supports Connection: keep-alive. The default size of the pool is two idle connections per remote host.

More interestingly, when you make a request with net/http, a race happens. Races in code are often an unwanted side effect, but in this case it's intentional. Two goroutines operate in parallel: one that tries to dial a connection to the remote host, and another which tries to retrieve an idle connection from the connection pool. The fastest goroutine wins.

To illustrate, let's look at the code executed when transport.RoundTrip(req) is called:

go func() {
    pc, err := t.dialConn(cm)
    dialc <- dialRes{pc, err}
}()

idleConnCh := t.getIdleConnCh(cm)

select {
case v := <-dialc:
    // Our dial finished.
    return v.pc, v.err
case pc := <-idleConnCh:
    // Another request finished first and its net.Conn
    // became available before our dial. Or somebody
    // else's dial that they didn't use.
    // But our dial is still going, so give it away
    // when it finishes:
    handlePendingDial()
    return pc, nil
case <-req.Cancel:  
    handlePendingDial()
    return nil, errors.New("net/http: request canceled while waiting for connection")
case <-cancelc:  
    handlePendingDial()
    return nil, errors.New("net/http: request canceled while waiting for connection")
}

First a connection is dialed, then we select over idleConnCh (down which idle connections are passed) or dialc, which gives us the result of the dial. Cancellation of the request is also possible if the caller decides RoundTrip has taken too long.

Late binding

So if an idle connection is available, retrieving it from the pool should win against dialing a new one. A similar approach (alongside other optimizations) has been used in Chromium, where it's referred to as late binding.

This echoes a mechanism we use in Railgun, CloudFlare's dynamic content optimizer, to ensure that an incoming request is serviced as quickly as possible. Idle connections to the Railgun component running at an origin server are periodically pruned after a timeout or may be closed because of errors, so an established connection from CloudFlare's edge is not always available. In this case, a direct request is made without Railgun whilst, in parallel, a persistent connection is initiated for use by subsequent requests.

As long as you have confidence that your connection manager is capable of cleaning up bad connections and cancelled requests properly, connection pooling and late binding can be important in reducing roundtrip latency. sync.Pool in the standard library may be a useful starting point if you need to pool something other than HTTP connections.

If you found this exploration interesting, we're hiring engineers in London, San Francisco and Singapore.

Cloudflare's connectivity cloud protects entire corporate networks, helps customers build Internet-scale applications efficiently, accelerates any website or Internet application, wards off DDoS attacks, keeps hackers at bay, and can help you on your journey to Zero Trust.

Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.

To learn more about our mission to help build a better Internet, start here. If you're looking for a new career direction, check out our open positions.
GoProgrammingRailgunSpeed & Reliability

Follow on X

Cloudflare|@cloudflare

Related posts

October 09, 2024 1:00 PM

Improving platform resilience at Cloudflare through automation

We realized that we need a way to automatically heal our platform from an operations perspective, and designed and built a workflow orchestration platform to provide these self-healing capabilities across our global network. We explore how this has helped us to reduce the impact on our customers due to operational issues, and the rich variety of similar problems it has empowered us to solve....

September 25, 2024 1:00 PM

Introducing Speed Brain: helping web pages load 45% faster

We are excited to announce the latest leap forward in speed – Speed Brain. Speed Brain uses the Speculation Rules API to prefetch content for the user's likely next navigations. The goal is to download a web page to the browser before a user navigates to it, allowing pages to load instantly. ...