I’m using a registration function that hashes the email in PBKDF2 with a random and unique Salt each time. The hashes email and his salt are saved in the DB. No problem with that.
The problem is that now I want to make sure that users only create one single account per email. Obviously to verify that I need to check my DB and that’s where the problem starts. I either lose in security or in time.
Because as I see it I have 2 choices:
1 – I change my hash method and use a common Salt for all emails. Which make me lose a bit of security.
Or
2 – I hash the email with all the Salts from the DB and check for matches. My guess is that this will be horribly slow.
So my question are:
- What should I do? Optimize security or “time”?
- Maybe hashing emails with unique and random salts in PBKDF2 is too much? If yes, what hashing method should I use?
- Is there any other solution?
PS: I didn’t post any code because I think this is more of a theory discussion but if code is needed let me know, I will add it.
22
Store a clear-text ‘digest’ of each email address alongside the hashed actual email. The digest should contain enough information to bring the number of candidates down to a reasonable handful (I’d say a factor of 1000 or more isn’t unrealistic), but not enough to guess the entire email address. For example, you could use the first two characters from the user part of the address, two characters from the domain name, and the TLD part; this would turn ‘[email protected]’ into ‘[email protected]’.
Finding collisions now becomes a two-step process: first, find all entries with matching digests, then do the actual hash comparison only on those. Instead of downloading all hashes from the DB for each check, you pre-filter them down to about 1/1000. That is a significant improvement, and while you trade some security for it, it’s better than either alternative.
1
If your purpose is to merely protect the emails, a simple hashing algorithm is all that is required. There is no point in even adding a random salt for making the system more secure, as you are not storing sensitive information.
For most hashing algorithms, the chance for a collision is negligent. So no two emails should be able to create the same hash. That way, you just need to make that column unique to detect duplicate emails.