If you need to design a user registration system, how would you implement username conflict checking? When a user creates a new name, the system need to check if this name has already been used.

The first and easiest way you might think is using a database table to store all usernames with a unique constraint.
A simple SQL query could then determine if the name is used.
But what if the system has billions of users, and millions of users are using the system simultaneously(like Facebook)? The simple SQL query could become a huge bottleneck.

Then you might get a new idea: store all usernames in memory with a hash table(such as Redis). The query would be super fast.
Yes, it’s a solution. But storing billions of usernames would consume a lot of memory.
If one username is 32 bytes in size, one billion usernames would use 32GB of memory, which is huge.

How could we get both speed and memory efficiency?

The answer is Bloom Filter:
A Bloom Filter builds a hash table with a range of keys, where each key uses a single bit to indicate whether it is occupied.
Using this approach, one billion names would require only 128MB of memory.
To reduce hash conflicts, multiple hash tables with different hash salts are used.
It uses much less memory and maintains high performance.

Contents