Top 'K' Frequent Items (Heavy Hitters)

1. Requirements

Functional

  • Track the most frequent items in a real-time stream (e.g., top trending hashtags on Twitter, most played songs on Spotify).
  • Provide the top 'K' items for a specific time window (e.g., last 1 minute, 1 hour, 1 day).

Non-Functional

  • Scalability: Must handle millions of events per second.
  • Low Latency: Retrieval of Top 'K' should be near-instant.
  • Accuracy vs. Efficiency: Approximate counts are often acceptable in exchange for massive space savings (e.g., using Count-Min Sketch).

2. Capacity Estimation

  • Traffic: 10 million events per second.
  • Storage: If tracking 1 million unique items, a simple hash map would take ~100MB. However, for billion-scale items, we need probabilistic data structures.
  • Bandwidth: 10M events/sec * 100 bytes/event $\approx$ 1GB/s.

3. APIs

GET /v1/top-k?k=10&window=1h

  • Response: {"items": [{"id": "item1", "count": 5000}, ...], "window": "1h"}

4. DB Design

  • In-Memory: Redis (Sorted Sets) for real-time storage.
  • Persistent: Cassandra or DynamoDB for historical data.
  • Schema: (item_id, count, window_timestamp).

5. HLD with Mermaid

graph TD A[Client Events] --> B[API Gateway] B --> C[Kafka/Message Queue] C --> D[Stream Processor - Flink/Spark] D --> E[(Redis - Top K Cache)] D --> F[(Database - Historical)] G[Query Service] --> E

6. Detailed Design

Count-Min Sketch

To handle massive cardinality, we use a Count-Min Sketch (probabilistic data structure).

  • Uses a 2D array of counters and multiple hash functions.
  • Provides an upper bound on frequency with fixed memory.

Heavy Keepers / Space Saving Algorithm

Used to track the actual IDs of the top items, as Count-Min Sketch only tracks counts.

Data Aggregation

  • Log Aggregator: Collects logs from app servers.
  • Stream Processing: Groups items and updates counts in sliding windows.
  • Merging Results: Distributed nodes track local top-K; a central aggregator merges them to find the global top-K.

7. Bottlenecks

  • Hot Keys: A single very popular item can overwhelm a partition. Solution: Use sub-partitioning or local aggregation before sending to the global counter.
  • Accuracy: For small 'K', error rates must be strictly monitored.
  • Memory: Keeping many time windows in memory can be expensive.