Consistent Hashing

Overview

Consistent hashing is a special kind of hashing technique such that when a hash table is resized and consistent hashing is used, only K/n keys need to be remapped on average, where K is the number of keys, and n is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped.

Key Concepts

1. The Hash Ring

Imagine the hash space as a circle (ring). Both servers and data keys are mapped to this ring using the same hash function.

  • Placement: A key is assigned to the first server it encounters while moving clockwise on the ring.

2. Virtual Nodes

To ensure a more even distribution of keys (load balancing), each physical server is represented by multiple "virtual nodes" on the ring.

  • Benefit: If a server is powerful, it can have more virtual nodes. If a server is added or removed, only a small portion of the keys are affected.

Trade-offs & Considerations

  • Load Balancing: Virtual nodes are essential to avoid "hotspots" where one server handles significantly more traffic than others.
  • Complexity: Implementing and managing a consistent hashing ring is more complex than simple modulo hashing (key % n).
  • Use Cases: Used in distributed caches (Memcached), distributed databases (Cassandra, DynamoDB), and load balancers.

Further Reading