Case Study: Ticketmaster¶
1. Requirements clarifications (Functional & Non-Functional)¶
Functional¶
- Users can search for events (concerts, sports) by name, location, or date.
- Users can view a real-time seating map and select specific seats.
- Users can reserve seats for a limited time (5-10 minutes) while completing payment.
- Users can securely purchase tickets and receive confirmation.
Non-Functional¶
- High Concurrency: Must handle massive traffic spikes when popular events go on sale.
- Fairness: Tickets should be allocated on a first-come, first-served basis.
- Transaction Integrity: Strict ACID properties to prevent double-booking of the same seat.
- Low Latency: Real-time updates on seat availability to provide a seamless user experience.
2. System interface definition (APIs)¶
searchEvents(query, location, date): Returns a list of events matching the search criteria.getSeatingMap(event_id): Retrieves the layout and current status of all seats for a specific event.reserveSeats(event_id, seat_ids, user_id): Places a temporary lock on selected seats for the user.confirmPayment(reservation_id, payment_info): Finalizes the purchase and converts reservations into confirmed tickets.
3. Back-of-the-envelope estimation (Capacity Estimation)¶
- Traffic: Anticipate 10M+ page views within minutes for high-demand event launches.
- Booking Rate: Must support at least 5,000 ticket transactions per second.
- Storage: Millions of events, each potentially having thousands of seats, requiring efficient indexing and storage.
4. Defining data model (Database Schema/Model)¶
- Event Table:
event_id (PK), name, venue_id, start_time, description. - Seat Table:
seat_id (PK), event_id (FK), row, number, status (Available, Reserved, Sold). - Reservation Table:
res_id (PK), user_id (FK), seat_id (FK), expiry_time, status. - Storage Choice: A Relational Database (SQL) is essential to leverage ACID transactions, ensuring that seat status updates are atomic and consistent.
5. High-level design (with Mermaid)¶
graph TD
A[User] --> B[Load Balancer]
B --> C[Search Service]
B --> D[Booking Service]
D --> E[(Redis - Seat Locks)]
D --> F[(SQL - Ticket DB)]
C --> G[(Elasticsearch - Event Index)]
H[Payment Service] --> D
6. Detailed design (Deep dive into components)¶
Handling High Traffic Spikes¶
- Virtual Waiting Room: Implement a queueing system (e.g., using a Token Bucket algorithm) to throttle incoming requests and allow users into the booking flow at a sustainable rate.
- CDN Integration: Cache static event details, images, and seating map templates at the edge to significantly reduce load on origin servers.
Seat Reservation (The Locking Mechanism)¶
- Selection: When a user selects a seat, the Booking Service attempts to acquire a distributed lock.
- Distributed Lock: Use Redis with a TTL (Time-To-Live) of 10 minutes:
SET seat_123 user_456 EX 600 NX. - Database Update: Upon successful locking in Redis, the status in the SQL database is updated to 'Reserved' within a transaction.
- Cleanup: If the user fails to complete the payment before the TTL, the Redis lock expires, and a background worker reverts the seat status to 'Available' in the database.
Database Sharding¶
- Shard the database by
event_id. This ensures all seat data for a specific event resides on the same shard, simplifying transaction management and improving performance for localized event queries.
7. Identifying and resolving bottlenecks (Scaling/Bottlenecks)¶
- Database Contention: Thousands of users may compete for the same 'front row' seats simultaneously. Resolution: Use row-level locking or optimistic concurrency control to manage simultaneous updates.
- Payment Processing Latency: Slow payment gateways can stall reservations. Resolution: Implement asynchronous payment processing with webhooks for status updates.
- Waiting Room Scalability: The virtual waiting room itself must be highly available and distributed to avoid becoming a single point of failure. Resolution: Use a distributed messaging queue like Kafka or a managed global load balancer.
Likely Follow-Up Questions¶
How do we handle a massive surge in traffic during a popular artist's tour sale?
We use a virtual waiting room (queue) to throttle the number of users entering the checkout flow, combined with aggressive CDN caching for the event browsing pages.
How do we prevent double-booking of the same seat?
We use distributed locking (e.g., Redis Lock or database row-level locking) with a short TTL. When a user selects a seat, it is "held" for ~10 minutes while they complete the payment.
How do we mitigate bot attacks buying up all the tickets?
We implement CAPTCHA, rate limiting per IP/Account, and behavioral analysis to detect automated scraping and purchasing.
What happens if the payment goes through but the session expires?
The system should perform a reconciliation process. If the payment is confirmed but the lock was released, it either attempts to finalize the reservation if the seat is still available or issues an automatic refund.