Database for HLD
In system design interviews, databases are one of the most critical components you’ll be asked about.
This page is a one-pager summary of key database concepts for HLD interviews, with links to full deep-dive articles.
1. SQL vs NoSQL
- SQL (Relational): structured schema, ACID transactions, strong consistency.
Best for: financial systems, structured data, strict constraints. - NoSQL (Non-relational): flexible schema, BASE consistency, scale horizontally.
Types: Document (MongoDB), Key-Value (Redis), Wide-column (Cassandra), Graph (Neo4j).
Best for: high scale, flexible schemas, analytics.
🔗 Deep dive: SQL vs NoSQL
2. Scaling Strategies
- Vertical scaling: bigger machine (CPU/RAM). Easy but limited.
- Horizontal scaling: add more machines. Needs sharding, replication, load balancing.
- Replication:
- Primary-Replica (one write, many reads).
- Multi-primary (many writes, conflict resolution needed).
- Sharding: split data across nodes (range, hash, directory).
🔗 Deep dive: Sharding vs Replication
🔗 Deep dive: Scaling Patterns
3. Indexing & Query Optimization
- Index = auxiliary structure to speed up lookups.
- Types: B-Tree, Hash, Composite, Full-text, Geospatial.
- Trade-offs: faster reads, but slower writes & extra storage.
🔗 Deep dive: Database Indexing
🔗 Deep dive: Caching & Query Optimization
4. Consistency Models
- CAP Theorem: can only have 2 of {Consistency, Availability, Partition tolerance}.
- PACELC: If Partition → choose between Availability or Consistency. Else Latency vs Consistency.
- Trade-offs:
- Banking → Strong Consistency.
- Social feeds → Eventual Consistency is acceptable.
🔗 Deep dive: CAP & Consistency Models
5. Specialized Databases
- Key-Value: fast lookups (Redis, DynamoDB).
- Document: JSON-like (MongoDB).
- Wide-Column: huge scale, denormalized (Cassandra, Bigtable).
- Graph: relationships, recommendations (Neo4j).
- Search: inverted indexes (Elasticsearch).
- Time-Series: metrics, logs (InfluxDB, Timescale).
🔗 Deep dive: Specialized Databases
6. Distributed Transactions & Patterns
- 2PC: strong consistency, but slow, blocking.
- 3PC: adds recovery, rarely used.
- Saga pattern: split long transactions into compensating steps.
- CQRS: separate read/write models.
- Event sourcing: store state as events.
🔗 Deep dive: Distributed Transactions
Interview Tips
- Always clarify: “Do we need strong consistency or is eventual consistency acceptable?”
- State assumptions: DAU, QPS, read/write ratio, object size.
- Call out trade-offs (consistency vs availability, cost vs performance).
- Mention caching + indexing first for query optimization.
- Don’t forget replication factor in storage estimates.
✅ Summary
- Pick SQL vs NoSQL based on requirements.
- Use replication for availability, sharding for scale.
- Indexes & caches speed up reads, but add cost/trade-offs.
- Choose the right consistency model for the problem.
- Specialized databases exist for special needs (time-series, graph, search).
For deeper study, explore the full Database Series Hub.