Polyglot Persistence & System Design Patterns
Modern applications rarely rely on a single type of database.
Instead, they use polyglot persistence — choosing the best database for each workload within the same system.
This approach is common in large-scale systems like Netflix, Uber, and Instagram.
1. What is Polyglot Persistence?
- Polyglot persistence = Using multiple databases in one system.
- Each database is chosen based on workload:
- Relational DB for transactions.
- NoSQL for scalability.
- Specialized DB for search, caching, or time-series.
Analogy: Just as you use different tools in a workshop (hammer, screwdriver, wrench), you use different databases for different tasks.
2. Why Polyglot Persistence?
- Workload Diversity: Different modules have different needs (transactions vs analytics vs search).
- Scalability: Specialized databases handle large-scale workloads better.
- Cost Optimization: Avoid overloading a single expensive database.
- Flexibility: Adopt new technologies without rewriting the whole system.
3. Real-World Examples
Netflix
- Cassandra → for user viewing history (write-heavy).
- MySQL → for billing and financial records (ACID).
- Elasticsearch → for search and recommendations.
Uber
- MySQL → core trips and payments.
- Cassandra → geospatial + high-write workloads.
- Redis → caching and session management.
- Elasticsearch → log search and analytics.
Instagram
- PostgreSQL → core relational data.
- Memcached → caching.
- Elasticsearch → full-text search.
4. Common Polyglot Patterns
4.1 Transactional + Analytical Split
- Transactional DB (OLTP): Fast inserts/updates (Postgres, MySQL).
- Analytical DB (OLAP): Complex queries (Snowflake, BigQuery, Redshift).
👉 Example: E-commerce site uses MySQL for orders, Snowflake for sales analysis.
4.2 Search + Relational
- Relational DB: User accounts, product catalog.
- Search Engine: Full-text search (Elasticsearch).
👉 Example: Amazon uses RDBMS for structured data + Elasticsearch for product search.
4.3 Cache + Primary DB
- Cache (Redis/Memcached): Serve hot data.
- Relational/NoSQL DB: Source of truth.
👉 Example: Twitter caches timelines in Redis, stores data in MySQL/Cassandra.
4.4 Hybrid Event-Driven
- Event Store (Kafka/EventStore): Capture system events.
- Multiple Databases consume events for specialized storage.
👉 Example: Banking system streams events →
- Cassandra (audit logs)
- Postgres (transactions)
- Elasticsearch (fraud detection).
5. Challenges in Polyglot Persistence
- Data Consistency: Harder to keep multiple DBs in sync.
- Increased Complexity: More moving parts = harder operations.
- Operational Overhead: Monitoring, backups, and scaling each DB separately.
- Latency Trade-offs: Syncing across DBs may add delays.
👉 Requires careful system design and often event-driven architectures.
6. Interview Tips
When asked “Which database would you choose?”:
- Don’t restrict yourself to just SQL or NoSQL.
- Show awareness of polyglot persistence.
- Tie DB choice to workload.
👉 Example Answer:
“For an e-commerce system, I’d use a relational DB for orders and payments, a document DB for product catalog, and Elasticsearch for search. To improve scalability, I’d add Redis for caching. This is polyglot persistence — using the right tool for each workload.”
7. Recap
- Polyglot persistence = using multiple DBs in one system.
- Real-world companies (Netflix, Uber, Instagram) rely on it.
- Common patterns:
- Transactional + Analytical split.
- Search + Relational.
- Cache + Primary DB.
- Event-driven hybrids.
- Challenges: data consistency, complexity, ops overhead.
- Interview: Always justify DB choice by workload.
Next Steps
👉 Continue with Case Studies to see how companies like Twitter, WhatsApp, and others applied these principles at scale.