(Image Copyright (c) Walt Disney)
If you've been following recent news about technical spying by the US National Security Agency and the UK's Government Communications Headquarters you may have come across a claim that the NSA was involved in weakening a random number generator. The obvious question to ask is... why mess with random number generation?
The answer is rather simple: good random numbers are fundemental to almost all secure computer systems. Without them everything from Second World War ciphers like Lorenz to the SSL your browser uses to secure web traffic are in serious trouble.
To understand why, and the threat that bad random numbers pose, it's necessary to understand a little about random numbers themselves (such as "what is a good random number anyway?") and how they are used in secure systems.
A Hacker News Hack
As an example of how random numbers go wrong I'll begin with a hack of the popular programming and technology web site Hacker News.
Four years ago I mentioned on the site that its random number generator was vulnerable to being used to attack the site. Not long after, and entirely independently, another contributor to the site actually carried out the attack with the permission of the site owner.
Here's how it worked. When you log into a web site you are typically assigned a unique ID for that session (the period you are logged in). That unique ID needs to be unique to you and not guessable by someone else. If someone else can guess it they can impersonate you.
In the case of Hacker News, the unique ID is a string of random characters such as lBGn0tWMcx7380gZyrUO9B. Each logged in user has a different string and the strings should be very, very difficult to guess or figure out.
The IDs are generated internally using a pseudo-random number generator. That's a mathematical function that can be called repeatedly to get apparently random numbers. I say apparently because, as the great mathematician John von Neumann said: "Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin." The computer scientist Donald Knuth tells a story of inventing a pseudo-random number generator himself only to be shocked at how poor it was.
Although pseudo-random number generators can generate a sequence of apparently random numbers they have weaknesses.
von Neumann used a simple pseudo-random number generator called the middle square that works as follows. You start with some number (called a seed) and square it. You take the four middle digits as your random number and square them to get the next random number, and so on.
For example, if you chose 4181 as a seed the sequence 4807, 1072, 1491, 2230, 9279, ... would be generated as follows:
Random number Its Square Middle digits 4181 17480761 4807 4807 23107249 1072 1072 1149184 1491 1491 2223081 2230 2230 4972900 9729 9279 94653441 6534 and so on
This particular pseudo-random number has long since been replaced by better ones such as the Mersenne Twister whose output is harder to predict. The middle square method is trivial to predict: the next number it generates is entirely determined by the number it last produced. The Mersenne Twister on the other hand is much harder to predict because it has internal state that it uses to produce random numbers.
In the world of cryptography there are cryptographically secure pseudo-random number generators which are designed to be unpredictable no matter how many random cnumbers you ask it to generate. (The Mersenne Twister isn't cryptographically secure because it can be predicted if enough of the random numbers it generates are observed.)
For secure systems it's vital that the random number generator be unpredictable.
Starting With A Seed
And all pseudo-random number generators need to start somewhere; they need to be seeded and that's where Hacker News failed. The random number generator was seeded with the time in milliseconds when the Hacker News software was last started. By some careful work, the attacker managed to make Hacker News crash and could then predict when it restarted within a window of about one minute. From it he was able to predict the unique IDs assigned to users as they logged in and could, therefore, impersonate them. (Similar random number problems enabled one group of people to cheat at online poker.)
The full details of how the Hacker News Hack worked are here. The attack worked because once Hacker News crashed the attacker would wait for it to start and note the current time. Amusingly, the Hacker News server was willing to give out that information. The attacker then had 60s worth of possible seeds (60,000 seeds since the seed was in milliseconds).
So, the attacker would log in and look at their own unique ID. It had been generated by random numbers inside Hacker News's software. He then tried out each of the 60,000 seeds and ran the random number generation algorithm used by Hacker News until he found a match with his own unique ID. That told him which seed had been used, and it let him keep generating further unique IDs by generating the same sequence of random numbers that Hacker News was using. From that he could predict the unique IDs given out to users as they logged in and he could then impersonate them.
The Hacker News code was changed to use the Linux /dev/urandom source of random numbers which means that today unique IDs are generated with a good random number generator and without the weak seed previously used.
So, there are two ways in which pseudo-random number generation can fail: the seed could be bad or the algorithm itself could be weak and predictable.
Random Numbers Everywhere
The Hacker News example isn't about cryptography itself, but random numbers are vital to cryptographic schemes. For example, any HTTPS session starts as follows:
The web browser sends information to the server about which version of SSL it wants to use and other information.
The web server replies with similar information about SSL versions and its SSL certificate.
The web browser checks that the certificate is valid. If it is, it generates a random 'pre-master secret' that will be used to secure the connection.
After that further exchanges occur all based on the randomly chosen pre-master secret. It needs to be unpredictable for the connection to be secure.
Here's part of how a computer using WiFi establishes a secure connection to an access point using the popular WPA2 protocol:
The access point generates a random nonce and sends it to the computer.
The computer generates a random nonce and sends it to the access point.
The access point and the computer continue on from there using those random nonce values to secure the connection.
Similarly, random numbers turn up when logging into web sites (and other systems), creating secure connections to servers using SSH, holding Skype video chats, sending encrypted email and more.
And the Achilles' Heel of the only completely secure cryptosystem, the one-time pad is that the pad itself must be completely randomly generated. Any predictability or non-uniformity in the random numbers used can lead to breaking of a one-time pad. (The other problem with one-time pads is reuse: they must be used only once.)
CloudFlare's Random Number Source
At CloudFlare we need lots of random numbers for cryptographic purposes: we need them to secure SSL connections, Railgun, generating public/private key pairs, and authentication systems. They are an important part of forward secrecy which we've rolled out for all our customers.
We currently obtain most of our random numbers from either OpenSSL's random number generation system or from the Linux kernel. Both seed their random number generators from a variety of sources to make them as unpredictable as possible. Sources include things like network data, or the seek time of disks. But we think we can improve on them by adding some truly random data into the system, and, as a result, improve security for our customers.
We've embarked on a project to further improve our random numbers by providing a source of truly random numbers that don't come from a mathematical process. That can be done using things like radioactive decay, the motion of fluids, atmospheric noise, or other chaos.
We'll be posting details of the new system when it's online.