So Random (Image Copyright (c) Walt Disney)

If you've been following recent news about technical spying by the US National Security Agency and the UK's Government Communications Headquarters you may have come across a claim that the NSA was involved in weakening a random number generator. The obvious question to ask is... why mess with random number generation?

The answer is rather simple: good random numbers are fundemental to almost all secure computer systems. Without them everything from Second World War ciphers like Lorenz to the SSL your browser uses to secure web traffic are in serious trouble.

To understand why, and the threat that bad random numbers pose, it's necessary to understand a little about random numbers themselves (such as "what is a good random number anyway?") and how they are used in secure systems.

A Hacker News Hack

As an example of how random numbers go wrong I'll begin with a hack of the popular programming and technology web site Hacker News.

Hacker News

Four years ago I mentioned on the site that its random number generator was vulnerable to being used to attack the site. Not long after, and entirely independently, another contributor to the site actually carried out the attack with the permission of the site owner.

Here's how it worked. When you log into a web site you are typically assigned a unique ID for that session (the period you are logged in). That unique ID needs to be unique to you and not guessable by someone else. If someone else can guess it they can impersonate you.

In the case of Hacker News, the unique ID is a string of random characters such as lBGn0tWMcx7380gZyrUO9B. Each logged in user has a different string and the strings should be very, very difficult to guess or figure out.

Pseudo-randomness

The IDs are generated internally using a pseudo-random number generator. That's a mathematical function that can be called repeatedly to get apparently random numbers. I say apparently because, as the great mathematician John von Neumann said: "Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin." The computer scientist Donald Knuth tells a story of inventing a pseudo-random number generator himself only to be shocked at how poor it was.

Although pseudo-random number generators can generate a sequence of apparently random numbers they have weaknesses.

The Three

von Neumann used a simple pseudo-random number generator called the middle square that works as follows. You start with some number (called a seed) and square it. You take the four middle digits as your random number and square them to get the next random number, and so on.

For example, if you chose 4181 as a seed the sequence 4807, 1072, 1491, 2230, 9279, ... would be generated as follows:

     Random number        Its Square    Middle digits
     4181                 17480761      4807
     4807                 23107249      1072
     1072                  1149184      1491
     1491                  2223081      2230
     2230                  4972900      9729
     9279                 94653441      6534
     and so on

This particular pseudo-random number has long since been replaced by better ones such as the Mersenne Twister whose output is harder to predict. The middle square method is trivial to predict: the next number it generates is entirely determined by the number it last produced. The Mersenne Twister on the other hand is much harder to predict because it has internal state that it uses to produce random numbers.

In the world of cryptography there are cryptographically secure pseudo-random number generators which are designed to be unpredictable no matter how many random cnumbers you ask it to generate. (The Mersenne Twister isn't cryptographically secure because it can be predicted if enough of the random numbers it generates are observed.)

For secure systems it's vital that the random number generator be unpredictable.

Starting With A Seed

And all pseudo-random number generators need to start somewhere; they need to be seeded and that's where Hacker News failed. The random number generator was seeded with the time in milliseconds when the Hacker News software was last started. By some careful work, the attacker managed to make Hacker News crash and could then predict when it restarted within a window of about one minute. From it he was able to predict the unique IDs assigned to users as they logged in and could, therefore, impersonate them. (Similar random number problems enabled one group of people to cheat at online poker.)

The full details of how the Hacker News Hack worked are here. The attack worked because once Hacker News crashed the attacker would wait for it to start and note the current time. Amusingly, the Hacker News server was willing to give out that information. The attacker then had 60s worth of possible seeds (60,000 seeds since the seed was in milliseconds).

So, the attacker would log in and look at their own unique ID. It had been generated by random numbers inside Hacker News's software. He then tried out each of the 60,000 seeds and ran the random number generation algorithm used by Hacker News until he found a match with his own unique ID. That told him which seed had been used, and it let him keep generating further unique IDs by generating the same sequence of random numbers that Hacker News was using. From that he could predict the unique IDs given out to users as they logged in and he could then impersonate them.

The Hacker News code was changed to use the Linux /dev/urandom source of random numbers which means that today unique IDs are generated with a good random number generator and without the weak seed previously used.

So, there are two ways in which pseudo-random number generation can fail: the seed could be bad or the algorithm itself could be weak and predictable.

Random Numbers Everywhere

The Hacker News example isn't about cryptography itself, but random numbers are vital to cryptographic schemes. For example, any HTTPS session starts as follows:

  1. The web browser sends information to the server about which version of SSL it wants to use and other information.

  2. The web server replies with similar information about SSL versions and its SSL certificate.

  3. The web browser checks that the certificate is valid. If it is, it generates a random 'pre-master secret' that will be used to secure the connection.

After that further exchanges occur all based on the randomly chosen pre-master secret. It needs to be unpredictable for the connection to be secure.

Here's part of how a computer using WiFi establishes a secure connection to an access point using the popular WPA2 protocol:

  1. The access point generates a random nonce and sends it to the computer.

  2. The computer generates a random nonce and sends it to the access point.

The access point and the computer continue on from there using those random nonce values to secure the connection.

Similarly, random numbers turn up when logging into web sites (and other systems), creating secure connections to servers using SSH, holding Skype video chats, sending encrypted email and more.

Soviet one-time pad

And the Achilles' Heel of the only completely secure cryptosystem, the one-time pad is that the pad itself must be completely randomly generated. Any predictability or non-uniformity in the random numbers used can lead to breaking of a one-time pad. (The other problem with one-time pads is reuse: they must be used only once.)

CloudFlare's Random Number Source

At CloudFlare we need lots of random numbers for cryptographic purposes: we need them to secure SSL connections, Railgun, generating public/private key pairs, and authentication systems. They are an important part of forward secrecy which we've rolled out for all our customers.

We currently obtain most of our random numbers from either OpenSSL's random number generation system or from the Linux kernel. Both seed their random number generators from a variety of sources to make them as unpredictable as possible. Sources include things like network data, or the seek time of disks. But we think we can improve on them by adding some truly random data into the system, and, as a result, improve security for our customers.

The sky above the port was the color of television, tuned to a dead channel

We've embarked on a project to further improve our random numbers by providing a source of truly random numbers that don't come from a mathematical process. That can be done using things like radioactive decay, the motion of fluids, atmospheric noise, or other chaos.

We'll be posting details of the new system when it's online.