Unique Id Generation In Distributed Systems

This problem solves generating unique ids in a distributed system with the below constraints:

Ids increase based on time
Ids are numeric
Ids should fit in 64-bit number
The design should allow up to 10,000 new ids per second.

Database based id generator

We can use the auto-increment feature in a single database to manage the ids. Each server makes an atomic call to the database to increment the id by 1 for each new id request. The problem with this approach is that it doesn’t scale and introduces a single point of failure (one database fails means the system is down).
The problem can be scaled by allowing each server to cache in memory K ids and then incrementally assigning a new id per request. While this will scale much better than the earlier approach, it poses an issue where across servers, the ids don’t necessarily move up by time, and in addition, the system is still exposed to a single point of failure.

UUID based unique id generator

The UUID isn’t numeric; they are 128 bits long, they do not increase over time. While theoretically possible to have a collision between two UUIDs the probability of collision is very low. The nice thing about UUID is that they are simple to generate and require no coordination between hosts.

Twitter snowflake approach

The Twitter snowflake approach is nice. The idea is to structure the ids as:

1bit	41bits	3 bits	7 bits	12 bits
signbit (not used)	timestamp	Datacenter id	Machine id	Sequence number

Sign bit: this is reserved for future use.

Timestamp: milliseconds since custom epoc. The timestamp will increase as time passes and hence the ids can be sorted by time.

Datacenter: 2^3 = 8. Allows up to 8 data centers.

Machine: 2^7 = 128. Allows up to 128 machines per data center.

Sequence number: Allows up to 2^12 (4096) ids per second.

2^41-1 allows up to 2,199,023,255,551 milliseconds which equates to 69 years. Therefore via this approach, you can save up to 69 years of ids. After the time elapses, there will be a need to convert the ids to a new format.

A note on time: Due to time drift in clocks, servers rely on NTP to synchronize time across nodes in the data center. However, this naturally creates unreliable times. So while in the above design we mentioned that unique ids can be sorted by time, without more careful calculation around true time (which creates an approximate true reflection of time by taking time drift into account), our ids will not be ordered correctly by time.

Other Options :

Use UUID : Its simple to use. However, ids are non numeric and cant be ordered by time. In addition, it takes 128 bits of data per id.
Use database counter : Could use a database based counter but that wouldnt scale well since the database will run into concurrency related slowdown from parallel calls. We could somewhat aleviate the concurrency by incrementing the database counters by N (i.e. 1000 per call) and then assigning counters to process in memory while going back to db when we run out of counters. This design will still not scale well, as we add more machines per data center. In addition, we dont have a clean way to distinguish ids across data centers and the ids will not reflect time.

Reference :

System Design Interview, Alex Xu