4. Data Model Definition¶

This step anchors the system in persistence. It explains what entities exist, how they relate, and where they should live.

What to Define¶

Core entities such as users, objects, sessions, events, messages, feeds, or counters.
Relationships between entities.
Primary keys, partition keys, and secondary indexes.
SQL versus NoSQL choice.
Data retention, archival, and deletion policies.
Which data belongs in object storage, cache, search, analytics, or a transactional database.
Which fields must be strongly consistent and which can lag.
Which data is immutable, append-only, or frequently updated.

Design Questions¶

What are the access patterns?
Which queries must be fast?
Which data needs strong consistency?
Which data can be denormalized?
How will the schema evolve over time?
Will this model need to support joins, fan-out, or aggregation?
Are there hot partitions or hot keys to avoid?

Data Modeling Choices¶

Use a relational schema when transactions and joins matter.
Use a key-value or document store when lookups dominate.
Use time-series or log-oriented storage when events arrive continuously.
Use object storage for large blobs such as images or video.
Use search indexes for text, filtering, and ranking.

SQL vs NoSQL Decision Tree¶

Choose SQL (RDBMS) When:

Structured data with rigid schema (users, transactions, orders).
Relational data with complex joins.
Need ACID transactions (banking, payments).
Developers familiar with SQL.
Examples: PostgreSQL, MySQL.

Choose NoSQL When:

Semi-structured or dynamic schema (JSON documents).
Very large data volumes (TB/PB scale).
Very high throughput (millions of writes/sec).
Eventual consistency acceptable.

NoSQL Types:

Key-Value: O(1) reads/writes (Redis, DynamoDB).
Document Store: JSON queries (MongoDB, CouchDB).
Wide Column Store: High write throughput (Cassandra, HBase).
Graph Database: Complex relationships (Neo4j).

Consistency Patterns (CAP Theorem)¶

You can guarantee 2 of 3: Consistency, Availability, Partition Tolerance. Since networks fail, choose Consistency or Availability:

Pattern	Speed	Reliability	Use Case
Strong Consistency	Slower	High	Transactions, banking
Eventual Consistency	Faster	Medium	Feeds, DNS, S3
Weak Consistency	Fastest	Low	Gaming, real-time

ACID vs BASE:

ACID (SQL): Atomicity, Consistency, Isolation, Durability → strong guarantees.
BASE (NoSQL): Basically Available, Soft state, Eventually consistent → higher throughput.

Output of This Step¶

A simple schema or entity model.
The database choice and why it fits the workload.
The read and write patterns the model must support.
The partitioning strategy and indexing approach.

Common Mistakes¶

Starting with tables before access patterns.
Ignoring partitioning and indexing.
Putting every kind of data into one store.
Designing a schema that is hard to scale or evolve.

« Previous Next »