Mobile web browsing is very different, at the network level, to browsing on a desktop machine connected to the Internet. Yet both use the very same protocols, and although TCP was designed to perform well on the fixed-line Internet, it doesn't perform as well on mobile networks. This post looks at why and how CloudFlare is helping.
We start with a simple ping. Here's a ping from my laptop machine (which is connected via 802.11g WiFi to a 20Mbps broadband connection) to a machine at Google. Looks like I'm getting a roundtrip time of about 20ms.
Here's the same ping done from my iPhone on the same WiFi network at the same location in the house. The ping time has gone up to about 60ms. So, in this instance, the round trip time had tripled just from going from laptop to phone.
But to see the real cost of mobile it's necessary to switch off WiFi and onto 3G. Here's the ping time on 3G to the same machine. Here's it's both much higher (we're now into 1/10 to 1/5 of a second territory) but it's also variable:
And then I get up and move to the front of the house and try again. The ping time has changed completely (the number of bars didn't) and I'm seeing between 0.5s and 1s of round trip time. That will have a serious effect on web browsing.
And for a final test I return to my original location and grip the iPhone firmly in my hand. The number of bars falls away and the round trip time becomes infinite! Pings simply aren't working any more.
What this illustrates is something that any smartphone user knows instinctively: network performance on phone is very variable and susceptible to location and environment. TCP would actually work just fine on a phone except for one small detail: phones don't stay in one location. Because they move around (while using the Internet) the parameters of the network (such as the latency) between the phone and the web server are changing and TCP wasn't designed to detect the sort of change that's happening.
In past posts I've looked at the effect of high latency on web browsing and TCP's connection and slow start cost. One of the fundamental parts of the TCP specification covers congestion avoidance: the detection and avoidance of congestion on the Internet. At the start of a connection TCP's slow start prevents it from blasting out packets until it's detected the maximum possible speed it can transmit at, and during a connection TCP actively watches for signs of congestion. The smooth running of the Internet as a whole relies on protocols like TCP being able to detect congestion and slow down. If not there'd likely be a congestion collapse.
Image credit: joiseyshowaa
TCP spots congestion by watching for lost packets. On the wired Internet lost packets are a sign of congestion: they're a sign that a buffer in a router or server somewhere along the route packets are taking is full and is simply dropping packets. When lost packets are detected by TCP it slows down.
That all falls apart on mobile networks because packets get lost for other reasons: you move around your house while surfing a web page, or you're on the train, or you just block the signal some other way. When that happens it's not congestion, but TCP thinks it is, and reacts by slowing down the connection.
It seems like it might be a simple matter to change the congestion avoidance algorithm in TCP to take into account the challenges of mobile networks, but it's actually an area of active research with many different possible replacements for the existing basic algorithm. It's hard because trying to balance maximizing throughput, preventing congestion on the Internet, dealing with actual congestion, and spotting phony congestion is complex.
Image credit: mikecogh
And if that weren't enough mobile networks also introduce another tricky problem: packet reordering. Although TCP is designed to cope with reordering of packets (because they might have followed different routes between source and destination) large reordering can occur in mobile networks when a mobile phone is handed off from one tower to the next.
For example, a stream of packets being transmitted by a moving mobile user (perhaps sending a large email) might be split with some going down one route through one tower and the rest through a different tower and by a different route.
This causes problems for some of the newer congestion avoidance algorithms (such as TCP New Reno) and can cause additional slow downs.
CloudFlare helps solve these problems for our customers in two ways. Firstly, we customize the parameters inside the TCP stacks in our web servers to tune for the best possible performance and secondly we actively monitor and classify the connections from people surfing our customers' sites.
By classifying connections we are able to dynamically determine the best way to behave on a connection. We know whether this is likely to be a high-latency mobile phone browsing session, or a high-bandwidth broadband connection in someone's home or office. Doing that allows us to give the best performance to end users, and ensure that customers' web sites are snappy wherever and however they are accessed.
And we continually look at ways of improving network performance for our customers by tuning TCP, monitoring performance, opening new data centers and introducing features like Rocket Loader, Mirage, Polish, SPDY, and Railgun.