Defence in depth
Here’s a puzzle - what do medieval castles and modern day websites have in common? It might seem an odd question but, when it comes to security, there's a lot that we can learn from history.
History showed that the strongest castles were those that had multiple layers of defence. They had moats, thick castle walls, a strong gatehouse, and archers repelling any armies. Any attackers managing to breach these would next have to get into the fortified keep, and then break into the secured rooms. Finally, if they got this far, they'd need to get past an experienced knight in order to get to their target - the king.
The same general principle applies to security today - a properly secured website will have many layers of defence in place to beat would-be hackers, such as firewalls, intrusion monitoring, and strong application-level security.
But what if hackers get past those and get inside, what can be done to protect passwords then? It's simple - websites don't really need to store any passwords at all. None. Zero. Nada.
Most websites should never need to store your actual password
But with no passwords stored, how can websites confirm a user's identity when they log in?
That's the clever bit.
Proving a user's identity
Passwords are something that only you should know - if you login to a website with a password that matches the one you signed up with then you've proved your identity. And the easiest way for a website to check this is by simply comparing the typed password against a stored copy.
This, however, would require the website to save your raw password - never a good plan! This is what hackers would love to get hold of, but which websites don't actually need to store.
A more secure method - which all modern websites should be doing - is to store a scrambled version of your password instead. Now when you login, the password you enter is also scrambled (using the same method) and then checked against the saved copy. If they match, the website will log you in.
The scrambled password is generated by using a "hashing" algorithm. This creates a "password hash", a random-looking mixture of numbers & letters a bit like 33c17d3eeaf581e7d4749173b1680e51.
The cleverness of the method lies in the fact that the stored hash can’t be used in place of the password to login with. Anything entered as a password is always converted to a hash value before being checked - and the hash of a hash is always different. Only the original password would create a matching hash; without this the website would simply refuse to log anyone in.
The key to making this all work is some clever mathematics that makes hashing a one-way process (a bit like scrambling eggs). This means that any criminals who do get hold of a hash are unable to reverse the hashing process, and so are unable to read the original password.
The hash (if it's a strong one, which we'll come onto in a bit) is therefore essentially useless to hackers, even if they manage to steal it.
The hashing process
So what is this "hashing" process?
It’s basically a way of converting one sequence of characters (in this case your password) into another. Crucially it’s:
- One-way (you can’t reverse the process to find the original value from the hash), and
- Repeatable (the same input will always result in the same output)
This is all done with some clever mathematics - search the web for Hashing Algorithms if you’re interested. Some popular ones are MD5 (now considered insecure), SHA-256, and PBKDF2.
As an example, if a website uses the hashing algorithm MD5, then a password of [email protected] would be stored as 33c17d3eeaf581e7d4749173b1680e51.
Making password hashing secure
Recovering passwords from hashes
Even though we can’t reverse the hashing process to recover the original password, there’s nothing to stop hackers from pre-calculating hash values for the most common passwords since the methods for calculating hash values are well known. Any stolen password hashes can then be compared to find a match and thus reveal the original password.
Some people have even pre-calculated the hash values for all possible combinations of letters, numbers, and punctuation symbols up to several characters long!
So if a hacker can just look up the original password for each hash, isn't storing the hash just as insecure as storing the original password? No, but only when it’s done correctly.
Strengthening the hash
For hashing to be secure and effective then websites need to implement password hashing properly. What's been missing in our explanation of hashing so far is both the concept of "salt" and the choice of hashing algorithm.
If you’re a website developer and want to know more then check out this great article on how to properly implement password hashing.
"Salt" (which is nothing to do with the white stuff you put on your food!) is the name given to a few randomly chosen characters that a website can append your password to before hashing it.
By adding random characters, two identical passwords will end up with very different password hashes thus making the pre-calculated tables worthless. Look at how this works in practice:
|Salt||Password to hash||MD5 hash|
|(no salt)||[email protected]||33c17d3eeaf581e7d4749173b1680e51|
See how just a small change will change the hash value dramatically?
The salt also effectively extends the length of the password (remember, the longer the password the more secure it is). This means that to pre-calculate the hash in order to look it up hackers would have to calculate many billions more combinations of letters and numbers.
We used 3 characters here to illustrate salting, but in reality we can make the salt as long as we want. This makes pre-calculating hashes effectively impossible. Clever, huh?!
Choosing the right hashing algorithm
Websites that do password hashing properly are also careful in their choice of hashing algorithm. Many of the popular algorithms, such as SHA-256, are designed for speed. This is because hashing is used extensively throughout computing for many different purposes; not just passwords but also (for example) comparing large files and documents.
Password hashing is different - we actually want it to be slow! For this reason special password hashing algorithms ("key stretching algorithms") have been developed, such as PBKDF2 and BCrypt, which are designed to take many times longer to calculate.
Calculating the hash of a single password is still incredibly fast - a fraction of a second - but the combined effect when hashing hundreds of millions of passwords can add up to days or more. Why is this useful? By using these slower algorithms, it makes it much more difficult for hackers to brute force your password.
Password hashing vs encryption
We've focused on hashing so far but are there other ways in which passwords can be stored? In a word, yes. They can also either be stored "in the clear" (ie exactly as you type them), or be encrypted first.
Storing in the clear
By now it should hopefully be obvious that storing passwords in the clear is not at all secure. If a hacker breaches the website then they can access everyone's passwords without any further effort. Incredibly though, despite all the recent media coverage of websites being hacked, many websites still don't apply any protection to passwords and simply store them "in the clear".
It’s impossible to know which websites still do this since no one is ever going to own up to it, however a few hacks have revealed some websites that were (at the time) guilty of this including Sony, Plenty of Fish, and even one division of Microsoft!
Encryption is another way in which some websites try to protect passwords. Whilst hashing & encryption are both forms of "cryptography" (in other words, applying a mathematical transformation to a piece of data), the crucial difference is that encryption is reversible whereas hashing is not.
Recovering an original password from its encrypted form requires knowledge of a master password (a secret "decryption key"). Encryption therefore is only as good as the encryption method used, as well as the security of this key - if a hacker gets hold of this then they've effectively got all the passwords too.
In 2013 Adobe was hacked and over 130 million encrypted passwords were stolen. Whilst there’s no evidence that the decryption key was also taken, weaknesses in the encryption used has enabled many of the passwords to be identified.
For these reasons encrypting passwords is considered a poor security choice for websites to do, although it’s still a great deal better than storing your raw password with no transformation at all.
Hashing passwords properly, and storing them in this form, is these days widely considered to be the most secure method for protecting passwords.