Failover Strategies — Active-Passive vs Active-Active
In distributed systems, failover ensures that if one component fails, another can take over with minimal disruption.
Two common failover strategies are active-passive and active-active.
1. Active-Passive Failover
How It Works
- One node is active (handles traffic).
- Another node is passive/standby, replicating data from active.
- If active fails → passive promoted to active.
Advantages
- Simpler to implement.
- Easier to reason about consistency.
- Lower operational cost (only one active at a time).
Disadvantages
- Failover delay (detection + promotion time).
- Standby resources underutilized.
- Risk of data loss if replication lags.
Example
- PostgreSQL with hot standby.
- Many traditional RDBMS clusters.
2. Active-Active Failover
How It Works
- Multiple nodes are active simultaneously.
- Traffic distributed via load balancer.
- If one fails, others continue seamlessly.
Advantages
- No downtime (continuous availability).
- Better resource utilization.
- Supports geo-distribution (multi-region).
Disadvantages
- More complex (conflict resolution needed).
- Higher operational cost (all nodes active).
- Requires strong coordination (e.g., consensus, quorum).
Example
- DynamoDB, Cassandra (multi-master setups).
- Google Spanner global clusters.
3. Active-Passive vs Active-Active — Comparison
Feature | Active-Passive | Active-Active |
---|---|---|
Availability | Failover delay | Continuous availability |
Resource usage | Standby underutilized | All nodes utilized |
Complexity | Simple | Complex (conflicts, sync) |
Cost | Lower | Higher |
Use case | Smaller systems, RDBMS | Large-scale, globally distributed |
4. Failover Detection
- Heartbeats: active node sends periodic signals.
- Timeouts: if heartbeats missed, promote standby.
- Orchestrators: ZooKeeper, etcd, Kubernetes controllers manage failover.
5. Interview Tips
- Mention failover when discussing HA systems.
- Say: “For smaller systems, I’d use active-passive. For global scale, active-active is better.”
- Call out trade-offs: active-passive simpler, active-active complex but resilient.
- Tie into replication strategies (sync vs async).
6. Next Steps
- Explore Geo-replication & Multi-region.
- Learn about Graceful Degradation.