No Emails, Update 2

Thoughts on implementing a method for not storing email addresses while still allowing for password reset, Part 2.

Reposted from Learn JS the Hard Way By Zed A. Shaw

No Emails, Update 2

Note: This is a blog about my thoughts and experiences designing a complete JavaScript website from scratch including payments. The code is rough and being developed in the open for educational purposes. Grab the RSS or follow @lzsthw on Twitter if you want to follow along.

In my previous post I proposed the idea of using bcrypt to hash everyone's email the same way I plan to hash passwords. The idea is to not have raw emails exposed in the database, but allow people to still request a password reset. It's a small compromise to not storing email at all, which makes password resets impossible. The only problem that I had is usernames. Someone forgetting their username would be out of luck, or so I thought.

After chatting with @grimmware I realized that it is possible to still allow people to forget their usernames. But first, a bit of back story confessional.

Well, Duh, It's a Hash

For some stupid reason I was thinking that bcrypt did an encryption operation that meant you couldn't compare hashes. I have no excuse for thinking this other than not enough coffee when I read the code. This idea actually doesn't even make sense unless the stored hash is some kind of key for the received password hash. I actually hadn't thought through this at all and just assumed you could not do this:

  1. bcrypt the email and store it.
  2. bcrypt a given email for the reset.
  3. bcrypt look up the user by the hash in #1 using the given hash in #2.

Actually, this totally works just fine, so I would be able to recover people's username and password like this:

  1. User forgets username or password and visits the reset page.
  2. They give their email address for a one time reset link.
  3. Server hashes the given email with bcrypt(given_email).
  4. I query the database for the original becrypt(stored_email) hash, and that's their user record.
  5. I send them an email with a reset link, and also remind them of their username.

This is incredibly simple, and still keeps the user's email private while letting them reset their password even if they forget their username. I honestly think I could stop there, except for one thing: Can this be vulnerable to a Denial of Service?

Hashing vs. Comparison Performance

Imagine I have two operations I could do for finding a user:

  1. User gives me their username, I find them in the database, I compare the hash with bcrypt.compareSync.
  2. User gives me the email, I hash it with bcrypt.hashSync, then use the resulting hash to search the database.

Is the performance of bcrypt.compareSync the same as bcrypt.hashSync? Learning from my mistake of just assuming they're different, I went to the code (this is in bcryptjs ) for the bcrypt.compareSync function:

bcrypt.compareSync = function(s, hash) {
    if (typeof s !== "string" || typeof hash !== "string")
        throw Error("Illegal arguments: "+(typeof s)+', '+(typeof hash));
    if (hash.length !== 60)
        return false;
    return safeStringCompare(bcrypt.hashSync(s, hash.substr(0, hash.length-31)), hash);

Well there we go, bcrypt.compareSync just calls bcrypt.hashSync. The performance is the same, or worse because it also does a safeStringCompare.

You might wonder why it's not using the JavaScript === to test the strings are equal. The equality operator in JavaScript will stop comparing two strings at the first different character to make the comparison fast. If you have a string that's wrong at the first character, this is very fast. If it's wrong at the last character, the === comparison takes longer. An attacker can use the difference in timing between two strings to slowly figure out the password faster than brute force. Using safeStringCompare forces the comparison to check every character of the resulting hash to prevent this timing attack.

This means the two methods should be close to the same performance as long as we're indexing the username and email_hash fields in the database, which is a given in this analysis.

The Order is Important

The real question is whether asking for a username first helps with preventing an Denial of Service (DoS) attacks. Let's look at the scenarios again, but break it down into a list of function imaginary function calls:

// username first method
let [username, given_email] = forgot_page();
let user = User.find_by_username(username);
if(user) {
    if(bcrypt.compareSync(user.email_hash, given_email)) {
        // send them the reset email

In this pseudo-code if my server is given a bad user, my server only pays for the username attempted lookup, and not the bcrypt operation. This is important because bcrypt is expensive, so avoiding it where possible help prevent DoS attacks.

Comparing that to the pseudo-code to the "just hash the email" method we get:

// just use the email hash
let given_email = forgot_page();
let email_hash = bcrypt.hashSync(given_email, salt);
user = User.find_by_email_hash(email_hash);
if(user) {
    // send them the reset email

In this scenario I pay for the bcrypt hash and database lookups no matter what I do. There's no way around it, so this becomes the most expensive way to do this.

The next question becomes:

Can You Stop The DoS?

Imagine I want to provide both an ability to forget your password and username. I have a few options:

  1. First present users with a username based form, and use the faster DoS resistant username lookup.
  2. If they forget their username then let them submit just their email and use the bcrypt(email) lookup version.

Am I really preventing a DoS in this situation since any attacked just has to go to the page in #2 and slam it? If my goal in this more complicated design is to prevent DoS then, I am not doing that. I'm just making it slightly more complicated for someone to do it, which means I'm not preventing anything.

This added complexity is also harder to secure since now I have to worry about two avenues into the application, and it's also less usable for the end user which increases possible mistakes they'd make.

Keeping it Simple and Usable

The security benefit for the user in this design is simply that their email is not stored on my server to remove tracking and abuse potential. The security risk for me is that a malicious user might attack the longer bcrypt processing and crash my site. What I need is a way to give the user their desired security benefit while also mitigating the DoS attack potential.

The most usable thing is just, "You forgot your password or username? What's your email? I'll send you a reset link." That's it. The process for this is then:

let given_email = forgot_page();
let email_hash = bcrypt.hashSync(given_email, salt);
user = User.find_by_email_hash(email_hash);
if(user) {
    send_reset(given_email, user.username);
    // email is then dropped rather than stored

They'll get their username included in the email, but we can add or remove that depending on what they say they forgot. It ultimately doesn't improve security since, if someone has access to their email, they're in trouble anyway. Just send the username and be done with it.

Reducing Abuse Potential

I now need to figure out how to solve the problem of Denial of Service abuse, and spamming people with these reset requests. The problem this system has--and every password reset system--is that a malicious user can submit another person's email repeatedly to:

  1. Cause a DoS that makes the site crash from excessive bcrypt calls.
  2. Fill the target email user's inbox with password reset emails they didn't request.

The DoS problem can be solved by simply not doing the bcrypt, database query, and email send in the web server. There's no need to do this real time, so we can just reply in the browser with, "If that email address is valid you will receive a reset email." We can then simply use a queue and job processor to do the real work based on submitted requests from the web server.

Now if an attacker tries to DoS the webserver they're really only wrecking a single service that's just doing password resets. If that fails then everyone else can at least keep using the system without interruption. It's also so low priority that it can be shutdown temporarily if it's under attack.

To prevent the spamming of someone's account with password resets we really just need to add some throttling to the requests so that only 1 every 24 hours is allowed. I can either tell them that's the limit, or just pretend I accepted it but drop the request.

The Problem with Queues

Keep in mind that I'm not trying to solve the case of some attacker getting on the servers and grabbing emails as they go through. That's a nearly impossible case to solve since once someone has root, all your security is pointless. All I'm trying to do is reduce the risk that user's have with giving out their emails. I want them to see that giving me their email is only so they can reset their password if they forget it, and to add some extra protection in case the database is exposed.

In order to offload this processing out of the web server I'd have to use my Bull queue system to handle the requests. My bull queue processors are just small Node processes hanging out on a Redis queue, so they are easy to move around and maintain. The only problem I have to investigate is if these queue messages stay around in the Redis server.

My understanding is Bull removes the messages from the queue, so a person's email address should only be in Redis for a short amount of time. If Bull doesn't delete them right away, then I have to add code to make sure they're destroyed immediately.

Next Steps

This should be enough thought about the problem to actually implement it, so the next release of my code should include this new authentication system with the obfuscated emails and password reset. I'll have more updates on this as I actually implement it, and if you have ideas, criticisms, or comments feel free to let me know.

More from Learn Code the Hard Way

Exploring the Replacement for C as an Educational Language

My thoughts so far on finding a replacement for C in teaching compiled languages and memory safety.

ResearchPublished Apr 20, 2024

How to Read Programmer Documentation

An excerpt from Learn Python the Hard Way, 5th Edition that explains how I analyze learn from projects with poor or no documentation (which is most of them).

PythonPublished July 29, 2023

The 5 Simple Rules to the Game of Code

An experimental idea to teach the basics of a Turing machine before teaching loops and branching. Feedback welcome.

PythonPublished July 29, 2023

Announcing _Learn Python the Hard Way_'s Next Edition

Announcing the new version of _Learn Python the Hard Way_ which will be entirely focused on Pre-Beginner Data Science and not web development.

AnnouncementPublished May 11, 2023