Scalability & Availability Principles
Introduction
Modern systems must serve millions of users, handle petabytes of data, and deliver responses in milliseconds.
Achieving this requires more than good code — it requires architectural principles that balance scalability and availability.
In this lesson, we’ll cover:
- The difference between scalability and availability.
- Vertical vs horizontal scaling.
- CAP theorem and its trade-offs.
- Strategies for consistency, availability, and partition tolerance.
- Java-based examples of scalable designs.
- Real-world case studies (Amazon, Netflix, Banking).
- Interview Q&A.
Scalability: What & Why
Scalability is a system’s ability to handle growth in workload without sacrificing performance.
Two Types of Scaling
Vertical Scaling (Scale Up)
- Add more resources (CPU, RAM) to one machine.
- Example: Upgrading DB server from 16GB RAM → 64GB RAM.
- ✅ Simpler. ❌ Limited by hardware & cost.
Horizontal Scaling (Scale Out)
- Add more machines/nodes to share the load.
- Example: Adding more web servers behind a load balancer.
- ✅ More scalable. ❌ More complex.
Java Example – Scaling Out (Spring Boot + Load Balancer)
// Controller remains stateless for horizontal scaling
@RestController
public class OrderController {
@PostMapping("/order")
public ResponseEntity<String> placeOrder(@RequestBody Order order) {
return ResponseEntity.ok("Order placed: " + order.getId());
}
}
✅ Stateless design → multiple instances can handle requests independently.
Availability: What & Why
Availability is the percentage of time a system remains operational.
Common Targets
- “Two nines” → 99% uptime (~3.65 days downtime/year).
- “Five nines” → 99.999% uptime (~5 minutes downtime/year).
Achieving High Availability
- Redundancy (multiple servers, failover).
- Load Balancing (distribute requests).
- Replication (data available in multiple places).
- Health Checks (automatic failover when node dies).
CAP Theorem
Introduced by Eric Brewer: In distributed systems, you can only guarantee two out of three properties:
- Consistency (C) – Every read sees the latest write.
- Availability (A) – Every request gets a response (success/fail).
- Partition Tolerance (P) – The system continues despite network partitions.
graph TD
C[Consistency] --- A[Availability]
A --- P[Partition Tolerance]
P --- C
Implications
- In distributed systems, P is non-negotiable (networks fail).
- You must choose between C or A in presence of partitions.
Trade-offs in Practice
CP Systems (Consistency + Partition Tolerance)
- Prioritize correctness over availability.
- Example: Banking systems (better to reject a transaction than risk inconsistency).
- Tools: Zookeeper, HBase.
AP Systems (Availability + Partition Tolerance)
- Prioritize uptime over strict consistency.
- Example: E-commerce cart (ok if two nodes see slightly different carts).
- Tools: Cassandra, DynamoDB.
CA Systems (Consistency + Availability)
- Only possible if no partitions (single node, tightly coupled).
- Example: Traditional RDBMS on one server.
Strategies for Scalability & Availability
1. Caching
- Use Redis or Memcached to offload read-heavy workloads.
- ✅ Improves latency. ❌ Must handle cache invalidation.
2. Database Sharding
- Split large DB into smaller shards.
- Example: User IDs 1–1M in DB1, 1M–2M in DB2.
- ✅ Scale horizontally. ❌ Increases complexity.
3. Replication
- Master-slave or leader-follower replication.
- ✅ High availability. ❌ Risk of replication lag.
4. Asynchronous Processing
- Use message queues (Kafka, RabbitMQ) for decoupling.
- ✅ Smooths spikes in workload. ❌ Adds eventual consistency.
5. Stateless Services
- Keep application servers stateless.
- ✅ Easy to scale horizontally. ❌ Requires external session store if needed.
Real-World Case Studies
1. Amazon
- Problem: Handle Black Friday traffic surges.
- Solution: Horizontal scaling via stateless microservices, DynamoDB for AP trade-off.
- Result: Elastic scaling with high availability.
2. Netflix
- Problem: Global availability for streaming.
- Solution: Replication across multiple AWS regions. Use of Cassandra (AP).
- Result: High uptime, even during regional outages.
3. Banking Systems
- Problem: Must prioritize correctness of balances.
- Solution: CP systems with strong consistency, often sacrificing availability during partitions.
- Result: Reliability over speed.
Extended Java Case Study
Scenario: Order Processing
Non-Scalable Design
// Stateful service (hard to scale)
public class OrderService {
private List<Order> cache = new ArrayList<>();
public void placeOrder(Order order) { cache.add(order); }
}
❌ Tied to one machine.
❌ Can’t scale horizontally.
Scalable Design
// Stateless service + external store
public class OrderService {
private final OrderRepository repo;
public OrderService(OrderRepository repo) { this.repo = repo; }
public void placeOrder(Order order) { repo.save(order); }
}
✅ Stateless → multiple service instances can run behind a load balancer.
✅ Repository backed by distributed DB.
Common Pitfalls
Shared State in Services
- Blocks horizontal scaling.
Over-Reliance on Strong Consistency
- Leads to poor availability in distributed environments.
Ignoring Partition Tolerance
- Designing as if networks never fail.
Premature Optimization
- Over-engineering scalability before real demand.
Interview Prep
Q1: What’s the difference between vertical and horizontal scaling?
Answer: Vertical scaling adds resources to one machine. Horizontal scaling adds more machines to share the load.
Q2: Explain CAP theorem.
Answer: In distributed systems, you can only guarantee two of Consistency, Availability, Partition Tolerance. Since P is unavoidable, systems must choose C or A in presence of partitions.
Q3: Give an example of AP vs CP trade-off.
Answer: Banking → CP (consistency first). E-commerce carts → AP (availability first).
Q4: How do you design stateless services in Java?
Answer: Keep no local state in service; rely on external DB or cache. Multiple instances behind load balancer can then handle requests independently.
Q5: What are common scalability patterns?
Answer: Caching, sharding, replication, asynchronous processing, stateless design.
Visualizing Scalable Architecture
graph TD
LB[Load Balancer] --> A[Service Instance 1]
LB --> B[Service Instance 2]
LB --> C[Service Instance 3]
A --> DB[(Distributed Database)]
B --> DB
C --> DB
A --> Cache[(Redis Cache)]
B --> Cache
C --> Cache
✅ Stateless services.
✅ Distributed DB with replication.
✅ Cache layer for performance.
Key Takeaways
- Scalability → ability to grow workload capacity.
- Availability → ability to stay operational.
- CAP theorem forces trade-offs in distributed systems.
- Strategies: caching, sharding, replication, stateless design, async processing.
- Real systems (Amazon, Netflix, Banking) balance principles differently based on domain.