Skip to content

Graceful Degradation

When systems fail, they shouldn’t fail completely.
Graceful degradation means continuing to provide partial service rather than a full outage.


1. Why Graceful Degradation?

  • Improves user experience during failures.
  • Maintains business continuity (some features still usable).
  • Builds resilience into large-scale systems.

2. Examples of Graceful Degradation

  1. E-commerce

    • Checkout service down → users can still browse products.
    • Payment fallback (retry later, or alternate gateway).
  2. Streaming (Netflix, YouTube)

    • HD unavailable → fallback to SD video.
    • Continue serving cached recommendations if backend is slow.
  3. Social Media

    • Feed service down → still allow posting or messaging.
    • Show cached timeline if live updates unavailable.
  4. Maps & Navigation

    • Live traffic unavailable → still show static map.

3. Strategies for Graceful Degradation

  • Caching: serve stale data if fresh data unavailable.
  • Fallbacks: switch to backup systems or reduced functionality.
  • Feature flags: disable non-critical features dynamically.
  • Partial results: return partial data instead of failing request.
  • Circuit breakers: avoid cascading failures by cutting off failing dependencies.

4. Benefits vs Trade-offs

BenefitTrade-off
Improves availabilityUsers may see outdated data
Reduces downtime impactExtra complexity to design
Enhances resilienceNot always possible for critical features

5. Real-World Systems

  • Netflix → Chaos Engineering tests ensure services degrade gracefully.
  • Amazon → continues product browsing even if recommendation service fails.
  • Slack → allows messaging even when file uploads fail.

6. Interview Tips

  • Always mention graceful degradation for resiliency questions.
  • Say: “If payment fails, system should still let users browse products.”
  • Tie to user experience: “Better a degraded system than a down system.”
  • Mention feature prioritization (critical vs optional).

7. Diagram

   [ Normal Service ] → All features work.
   [ Failure Detected ] → Non-critical features disabled, core continues.

8. Next Steps


Connect: LinkedIn

© 2025 Official CTO. All rights reserved.