How are passwords typically stored?
Most (correctly configured) applications and operating systems use a one-way cryptographic hash to store passwords. A one-way hashing algorithm (e.g. MD5, SHA1, SHA2) reduces a string of bytes of arbitrary length to a fixed length digest, or "hash." Hashes have properties that make it very useful for storing passwords:
There are other properties, however this should be sufficient as a high level introduction. Far more detailed information can be found on the Wikipedia page.
In its simplest form, the process for authenticating via password works like this:
User provides a password -> Run the password through a hashing algorithm -> output compared to previously stored hash value.
Why might we want a slightly different option?
One drawback to the irreversibility of a cryptographic hash, are instances when being able to produce original plaintext passwords would be useful. An example might be the desire to compare a proposed new password against a previously used one.
A better way (or a different one):
In a conversation I had on Twitter a few weeks back, the question was asked: "How could a vendor check similarity to a previous password without storing the passwords in clear text?"
For example, the website in question was able to do something like this:
How is this possible, given only the method described above? The original question is a good one, their assumptions (correctly) were:
Right? Yes. But - there is another way, read on...
Instead of storing the user's password as a one-way hash, use the password to generate a hash (hereafter referred to as a key) and use that key as input to an encryption algorithm, such as AES128, and use that algorithm to encrypt the user's password and profile data (and anything else you might want to keep secure, such as historical passwords.)
At it's most simple, just doing basic authentication with username / password the steps are this:
A diagram might be useful:
Taking this a step further, there's no reason to limit the encrypted data to just the user's username. The system could also store authorization to system resource data, and other sensitive data, such as: previous (plaintext) passwords.
Now the scenario from the Twitter conversation would work as follows:
Phew. That may seem like a huge number of steps, or that it is overly complicated - but does have the benefit of solving for the scenario we described - preventing users from slightly altering a recent historical password by storing encrypted copies of old passwords.
This could be a drawback too, as a compromised current password would give a would-be attacker access to a trove of old passwords. This can be mitigated by using two-factor authentication, using long passphrases, and using an adjustable memory-hard function (more on this shortly).
What about that KDF? And what is this memory hard business?
Properly implemented, this solution (and others) can use an adjustable KDF as a sort of throttle against offline-brute force attacks. This password-key-encryptedDB solution is just as vulnerable to offline password cracking as any other (See RACF passwords in John The Ripper or oclHashcat - they work exactly this way).
It's beyond the scope of this post to get deep into that, but simply put: An adjustable KDF has parameters that can be adjusted, that intentionally slow down the time it takes the CPU to compute the hash/key. Why is this desirable? Because to a user, a hash that takes 1 microsecond vs one second to compute isn't noticeable, but for a brute force operation, that will slow the number of operations down by a factor of 1,000,000. Going from a hypothetical billion hashes per second to 1,000 hashes per second. That is significant. If you want more information on how this works see: