(Image credit: jantik)
Over the last few weeks a number of companies have seen their password databases leaked onto the web and found that despite having made some effort to protect them many of the passwords were easily uncovered. Unfortunately, the disclosure of password databases is an ugly reality of the Internet; entire forums are dedicated to hackers who collaborate to uncover passwords from files and specialized password cracking software is easy to obtain.
To understand password storage it's best to go back to basics and some history.
The simplest way to store a password is just to store it in a database. When a customer tries to log in and types in the password 'supersecret' that string is compared with the password in the database and the customer is or is not allowed in.
Of course, storing passwords in the clear (or in plain text) is very dangerous. If the database is compromised then the passwords can be read and every account can be broken into. Despite this danger there are many companies that store passwords in plain text. Some attempt to encrypt the password and then decrypt it when you log in. Although that's slightly better than a plain text password in the database, it only adds a small hurdle for a hacker: they just have to take the database and the encryption key and since the key is almost certainly on the same machine as the database it becomes trivial to do.
Despite the poor security offered by encrypted or plain text passwords, many companies still use them. One sure fire way to find out whether a site you are using does this is to ask for a password reset: if the company is able to email you your old password then it was stored insecurely.
If you're following along and are new to password security you may be asking yourself: how can you test someone's password when they want to log in if you don't store it in some way? It does seem like an unsovable conundrum until you discover the cryptographic hash function (which I'll just shorten to hash function).
A hash function takes some string (such as a password) and turns it into a long number. In doing so it ensures two things: it's not possible to do the reverse (you can't take the number and run the algorithm backwards to get the string) and the number it generates is unique (i.e. there are no two strings that have the same number).
(Aside: I've simplified things a little in the previous paragraph. "not possible" should really be "infeasible" (i.e. you'd need to have more computers than there are on the planet to find the string) and "unique" should be "vanishingly improbable that two different strings will have the same number").
Hash functions work by taking the string to be hashed and scrambling the bits over and again to produce a number. One popular hash function is SHA-1. The SHA-1 hash of the password 'supersecret' is a761ce3a45d97e41840a788495e85a70d1bb3815 (the numbers are so long that they are typically written like this in hexadecimal instead of decimal. In decimal that number is 955,582,595,971,963,915,918,670,633,711,507,401,334,868,097,045). The SHA-1 hash of 'Supersecret' (note that capital S) is 1b417472fc8e2a0a4d44ed43f874309ca4069099 (as you can see it's totally different).
Hash functions are used for many purposes such as checking that the contents of a file haven't changed. When you download a file from the Internet its hash might also be sent so that your computer can check that no bits in the file have been accidentally flipped in transmission.
Hash functions are also often used in password systems because instead of storing the password, you can simple store the hash. Since the hash can't be easily reversed the stored hash is a secure way of keeping the password. When a visitor comes to the site the hash of the password they entered is calculated and compared with the hash in the database. Since the hashes are unique they'll only be able to log in with the right password.
(Image credit: ToGa Wanderings)
Unfortunately, simply using a hash function like this is dangerous. Over the last few weeks a number of prominent Internet companies have found that their password databases have been cracked even though they 'hashed' their passwords. To see why, try Googling a761ce3a45d97e41840a788495e85a70d1bb3815. You might be surprised to find that the first result tells you that that's the SHA-1 hash of 'supersecret'.
The problem with simple hash functions is that hackers simply get a dictionary and compute all the hashes of all the possible passwords made from the dictionary. These massive databases of precomputed hashes are called rainbow tables. If a password database leaks then the hackers just look up the hashes in the rainbow table. The hashes that aren't found in the rainbow table correspond to those users who created long, complex passwords that weren't precomputed in this way. That's one reason why picking a long, complex password matters: hackers won't have already computed its hash.
Even though the hash function itself couldn't be reversed, it was possible to create a table of precomputed password hashes (especially for poorly chosen passwords).
The way around rainbow tables is with something called salt. Let's suppose you've picked the password 'supersecret' and company X is going to use SHA-1 to hash the password. Instead of simply hashing the password, company X picks a random salt (a random string of characters) that's unique to you (such as '$f2%38h##f23'). Instead of computing SHA-1(supersecret) they compute SHA-1(supersecret$f2%38h##f23) and get 33438b91ce09e695923 2f698b7939e6ee1d0712a. Try Googling that and you won't get any results.
(Image credit: stlbites)
Since each user has some random salt applied to the hash, rainbow tables are useless. It's not possible to precompute the hashes of all the possible passwords with all the possible salt values.
Until recently a 'salted hash' like this was how CloudFlare stored user passwords.
Unfortunately, password cracking techniques benefit enormously from two things: Moore's Law and the speed of hash functions. Hash functions weren't originally designed for protecting passwords, they were designed to check the integrity of data by detecting changes (notice how just changing from s to S in supersecret dramatically changed the SHA-1 hash above) and for that reason they were designed to be fast, very fast.
As computers have increased in speed with Moore's Law the speed of hash functions has made it possible to do away with rainbow tables and start attacking passwords directly even when salted. When a password database leaks, password cracking software is able to compute millions of passwords per second applying the unique salt to each password and checking the resulting hash value. The software literally tries out combinations of words and letters and computes the hash for each one.
That means that only long, complex passwords are safe with a salted hash.
(Image credit: 4nitsirk)
The solution is to use a hash function that's slow. If the hash function itself is slow then it slows down cracking software. If the speed can be chosen so that over time the hash function can be made slower, then the hash function can be slowed down so that password cracking doesn't get easier.
Happily, hash functions with just that property have been invented specifically to help keep passwords safe. We recently upgraded our entire password database to use bcrypt. bcrypt is just like a normal hash function but it has an additional parameter: as well as being fed the password and some random salt it's fed a cost. The cost tells the hash function how hard to work in computing the hash (and thus determines how long it will take).
Over time the cost can be increased (it's just a number) to keep pace with faster and faster computers and keep passwords safe by making the hash function slower and slower.
Just like all aspects of security, password storage needs to be reviewed from time to time. As we've seen recently many companies don't take the time to upgrade their password security leading to serious problems.
And, of course, users can help out too: password cracking relies partly on the algorithms used to store the passwords and partly on the complexity of the password. Make sure to choose a long, complex password and don't use it on any other site.