Skip to content

Failover Strategies — Active-Passive vs Active-Active

In distributed systems, failover ensures that if one component fails, another can take over with minimal disruption.
Two common failover strategies are active-passive and active-active.


1. Active-Passive Failover

How It Works

  • One node is active (handles traffic).
  • Another node is passive/standby, replicating data from active.
  • If active fails → passive promoted to active.

Advantages

  • Simpler to implement.
  • Easier to reason about consistency.
  • Lower operational cost (only one active at a time).

Disadvantages

  • Failover delay (detection + promotion time).
  • Standby resources underutilized.
  • Risk of data loss if replication lags.

Example

  • PostgreSQL with hot standby.
  • Many traditional RDBMS clusters.

2. Active-Active Failover

How It Works

  • Multiple nodes are active simultaneously.
  • Traffic distributed via load balancer.
  • If one fails, others continue seamlessly.

Advantages

  • No downtime (continuous availability).
  • Better resource utilization.
  • Supports geo-distribution (multi-region).

Disadvantages

  • More complex (conflict resolution needed).
  • Higher operational cost (all nodes active).
  • Requires strong coordination (e.g., consensus, quorum).

Example

  • DynamoDB, Cassandra (multi-master setups).
  • Google Spanner global clusters.

3. Active-Passive vs Active-Active — Comparison

FeatureActive-PassiveActive-Active
AvailabilityFailover delayContinuous availability
Resource usageStandby underutilizedAll nodes utilized
ComplexitySimpleComplex (conflicts, sync)
CostLowerHigher
Use caseSmaller systems, RDBMSLarge-scale, globally distributed

4. Failover Detection

  • Heartbeats: active node sends periodic signals.
  • Timeouts: if heartbeats missed, promote standby.
  • Orchestrators: ZooKeeper, etcd, Kubernetes controllers manage failover.

5. Interview Tips

  • Mention failover when discussing HA systems.
  • Say: “For smaller systems, I’d use active-passive. For global scale, active-active is better.”
  • Call out trade-offs: active-passive simpler, active-active complex but resilient.
  • Tie into replication strategies (sync vs async).

6. Next Steps


Connect: LinkedIn

© 2025 Official CTO. All rights reserved.