System Design Mindset
This page teaches the way of thinking you should bring to any High-Level Design (HLD) question — whether it’s an interview or a real-world architecture discussion.
Good system design is mostly about clarity, structure, and trade-offs. Start with what the system must do, then decide how well it must do it, then design incrementally with clear checkpoints.
1. Quick summary (the short elevator version)
- Clarify requirements — ask about functional and non-functional requirements (NFRs).
- Identify constraints & assumptions — resources, deadlines, legacy tech.
- Estimate workload — QPS, throughput, storage, latency targets.
- Sketch a simple design (monolith) that satisfies requirements.
- Find bottlenecks and iterate: caching, replication, partitioning, async.
- Call out trade-offs at every step and show monitoring & failure plans.
2. Functional vs Non-Functional Requirements
Functional requirements (the what): concrete features and behaviors.
- Example items: create user, upload image, send message, redirect short URL.
Non-functional requirements (NFRs) (the how well): quality attributes and constraints.
- Scalability (e.g., must handle 100k QPS).
- Latency (e.g., < 200 ms tail latency).
- Availability (e.g., 99.95% uptime).
- Consistency (strong vs eventual).
- Durability, security, cost, maintainability, compliance.
Interview habit: always repeat requirements back and ask which NFRs are critical. e.g., “Do you want strict consistency for this API, or is eventual consistency acceptable?”
3. Clarify constraints & assumptions (fast)
Ask explicit questions before designing. Typical constraints:
- Time: do you have 15 minutes to design or 45 minutes?
- Budget: can we use managed cloud services or must be cost-sensitive?
- Existing systems: any legacy DBs or languages?
- Geographic: single region or global users?
- Team: how many engineers for maintenance?
State assumptions out loud if interviewer doesn’t specify them:
“I’ll assume 1M daily active users, ~10M requests/day, and low-latency reads are most important. If that’s wrong, tell me.”
4. Find bottlenecks — what to look for
Bottlenecks depend on workload. Common hotspots:
- CPU — expensive compute (image processing, ML inference).
- Memory — large in-memory caches, per-connection state.
- Disk / IOPS — DB writes, backups, compaction.
- Network — bandwidth-heavy workloads (video).
- DB connections — connection limits & locks.
- Single Point of Failure — single DB, single load balancer, etc.
How to present this: show a quick table linking requirements → likely bottlenecks. Example: "If system is write-heavy, DB write throughput is the likely bottleneck."
5. Step-by-step scaling roadmap (the interview-friendly flow)
- Start simple — single app server + single DB (monolith).
- Optimize vertically — bigger machine, better disks (short-term).
- Add cache & read replicas — reduce DB load for reads.
- Partition (shard) and distribute — split data when DB won't fit/handle writes.
- Move to microservices only when domains require independent scaling.
- Geo-distribute with replication & edge caches for global users.
- Automate & observe — add autoscaling, monitoring, distributed tracing.
Always justify each move: don’t jump to microservices or sharding without explaining why.
6. Decision cues — when to do what
- Use caching when reads >> writes and latency is critical.
- Use read replicas when read scaling is needed but writes are limited.
- Shard when dataset size or write throughput exceeds single-node capacity.
- Use specialized DBs when problem fits (graph for relationships, time-series for metrics).
- Use asynchronous processing for non-critical or long-running tasks (image processing, emails).
- Prefer eventual consistency in user-facing features where slight staleness is acceptable (feeds), but require strong consistency for money transfers.
7. A small worked example — URL Shortener (brief)
Clarify requirements
- Functional: create short URL, redirect, track analytics.
- NFRs: low latency on redirects (<50ms), high availability, 100M short links stored.
Capacity sketch (example arithmetic shown step-by-step)
Suppose: 1,000,000 daily active users, each performs 10 redirects/day → total requests/day = 1,000,000 × 10 = 10,000,000 requests/day.
Convert to requests/sec:
- Seconds/day = 24 × 3600 = 86,400.
- Requests/sec = 10,000,000 / 86,400.
Step-by-step:- 24 × 3600 = 86,400.
- Divide: 10,000,000 ÷ 86,400 = 115.740740...
- So average QPS ≈ 116 req/s. Plan for peak: use a 10× factor → ~1,160 req/s.
From here:
- One small web server can serve thousands of cached redirects; use CDN for static redirect results.
- Data size: 100M short-links × 100 bytes/link ≈ 10,000,000,000 bytes ≈ 10 GB (double-check assumptions for metadata).
Simple architecture
- A web layer with CDN for redirects.
- Key-value store (Redis/DynamoDB) for short->long mapping (fast reads).
- A write path that writes to durable DB and populates cache.
- Async pipeline for analytics.
Trade-offs
- Storing canonical mapping in a durable DB vs caching for latency.
- Collision handling (hash length decisions).
- Analytics can be eventually consistent.
8. Design process checklist (step-by-step to use in interviews)
- Clarify functional & non-functional requirements (repeat them).
- State constraints & assumptions (and ask clarifying questions).
- Estimate load (QPS, storage, bandwidth). Show any calculations briefly.
- Draw a minimal architecture that satisfies requirements.
- Identify bottlenecks in that design.
- Propose targeted optimizations (cache, replicas, partitions).
- Discuss trade-offs (cost, complexity, consistency).
- Add observability & failure plan (monitoring, retries, graceful degradation).
- Summarize — repeat final architecture and why it meets the key NFRs.
9. Interview tips — what interviewers want to hear
- Ask clarifying questions before designing.
- Speak in a structured way: requirements → constraints → architecture → scaling plan.
- Show numbers (even rough estimations) — interviewers like capacity reasoning.
- Call out trade-offs explicitly. Say which requirement you’re prioritizing (latency vs durability vs cost).
- Start simple and iterate — show incremental improvements as load grows.
- Mention observability & testing — how you’ll detect and recover from failures.
- Use real-world analogies and case studies where relevant.
10. Quick checklist to keep on hand
- Have I asked about functional & non-functional requirements? ✅
- Have I stated assumptions and constraints? ✅
- Did I estimate QPS/throughput and storage? ✅
- Is my initial design simple and end-to-end? ✅
- Did I point out the major bottlenecks and fixes? ✅
- Did I call out trade-offs explicitly? ✅
- Did I add monitoring and failure handling? ✅
11. Next steps / links
- Continue to Workload Estimation for worked examples of capacity calculations.
- See practical patterns in Caching strategies and Sharding & Replication.