Case Study: Dropbox / Google Drive

1. Requirements clarifications (Functional & Non-Functional)

Functional

  • Users can upload and download files from any device.
  • File synchronization across multiple devices.
  • File versioning and history (recovery).
  • Sharing files and folders with other users.

Non-Functional

  • Durability: Data should not be lost (99.999999999% durability).
  • Availability: High availability for file access.
  • Efficiency: Minimize bandwidth usage (incremental updates).

2. System interface definition (APIs)

  • uploadFile(file_data, metadata)
  • downloadFile(file_id, version)
  • listUpdates(cursor) (used by clients for sync)

3. Back-of-the-envelope estimation (Capacity Estimation)

  • Users: 500M total, 10M daily active users (DAU).
  • Storage: Average 10GB per user $\rightarrow$ 5 Exabytes total storage.
  • Traffic: 100M file uploads/updates per day.

4. Defining data model (Database Schema/Model)

  • Metadata DB: Relational (MySQL/PostgreSQL) for ACID properties.
    • User (user_id, name, email)
    • File (file_id, name, path, is_directory, parent_id, latest_version)
    • FileVersion (version_id, file_id, device_id, checksum, size, timestamp)
  • Block Storage: Amazon S3 or similar object store for actual file chunks.

5. High-level design (with Mermaid)

graph TD A[Mobile/Desktop Client] --> B[Load Balancer] B --> C[API Servers] C --> D[Metadata DB] C --> E[Block Service] E --> F[Object Storage - S3] G[Notification Service] --> A C --> G H[Sync Service] --> D

6. Detailed design (Deep dive into components)

Block Level Storage

Files are split into fixed-size chunks (e.g., 4MB). Only modified chunks are re-uploaded.

  • Deduplication: Checksums (SHA-256) are used to identify identical blocks across the entire system to save space.

Client-Side Sync

The client keeps a local database (SQLite) to track file states.

  • Chunking: Files are chunked locally.
  • Watchdog: Monitors local file changes.
  • Differential Sync: Only sends the delta (modified blocks).

Metadata Cache

To speed up lookups, metadata for active users is cached in Redis.

7. Identifying and resolving bottlenecks (Scaling/Bottlenecks)

  • Metadata DB Scaling: As the number of files grows into trillions, sharding the Metadata DB becomes necessary.
  • Notification Latency: Ensuring real-time sync across devices requires a robust Long Polling or WebSocket-based notification service.
  • Upload Speed: Use Edge locations (CDNs) to terminate connections closer to the user.

Likely Follow-Up Questions

How do we efficiently handle very large files (e.g., several GBs)?

Files are broken into fixed-size chunks (e.g., 4MB). Only the chunks that have changed are re-uploaded and synced, significantly reducing bandwidth and storage usage (Differential Sync).

How do we ensure data consistency across multiple devices?

We use a centralized Metadata Database to keep track of file versions. When a client modifies a file, it updates the metadata server, which then notifies other connected clients via a notification service (long polling/WebSockets).

How do we handle file versioning and recovery?

Each change to a file creates a new version of the metadata record. Older versions of chunks are kept in storage for a set period (e.g., 30 days), allowing users to roll back to previous states.

How can we optimize the upload of many small files?

Small files can be bundled into a single upload request or batch-processed to reduce the overhead of multiple HTTP requests and metadata updates.