Skip to content

Sharding vs Replication

When databases grow beyond a single machine, we need scaling strategies.
The two most important ones are Replication and Sharding.
They solve different problems and are often used together in real-world systems.


1. Replication

What is Replication?

Replication = copying the same data across multiple machines.

Types

  1. Primary-Secondary (Master-Slave)

    • Writes → Primary.
    • Reads → Replicas.
    • Example: MySQL with read replicas.
  2. Multi-Primary (Multi-Master)

    • Writes allowed on multiple nodes.
    • Requires conflict resolution (last-write-wins, version vectors).
    • Example: MongoDB Replica Sets, PostgreSQL BDR.

Benefits

  • High Availability: If one node fails, another serves data.
  • Read Scalability: Distribute reads across replicas.
  • Disaster Recovery: Backups from replicas.
  • Geo-Distribution: Place replicas closer to users.

Downsides

  • Replication lag (replicas behind primary).
  • Conflict resolution (multi-master).
  • Extra storage cost.

2. Sharding

What is Sharding?

Sharding = splitting data across multiple machines.
Each shard stores only a subset of data.

Example:

  • Shard 1 → Users A–M
  • Shard 2 → Users N–Z

Strategies

  • Range-based: Based on value range (user_id 1–1000, 1001–2000).
  • Hash-based: Hash of key determines shard.
  • Directory-based: Lookup service decides which shard to query.

Benefits

  • Write Scalability: Distributes load across servers.
  • Storage Scalability: Data split across machines.
  • Parallelism: Queries can be processed by multiple shards.

Downsides

  • Complex query routing (joins across shards are expensive).
  • Rebalancing shards is hard (when one shard grows too large = hotspot).
  • Application complexity (must know shard key).

3. Sharding vs Replication (Comparison)

FeatureReplicationSharding
DefinitionCopies of the same data on multiple nodesSplits data across multiple nodes
ImprovesAvailability, Read scalabilityWrite scalability, Storage capacity
Data StoredFull copy on each replicaSubset on each shard
Query RoutingAny replica can serve readsMust find the correct shard
Failure ToleranceHigh (replicas act as backup)Lower (if a shard fails, partial data lost)
Use CaseRead-heavy workloads, fault toleranceLarge datasets, write-heavy workloads

4. When to Use What

  • Replication

    • If your workload is read-heavy.
    • If you need high availability.
    • If your dataset fits on a single machine, but you want fault tolerance.
  • Sharding

    • If your workload is write-heavy.
    • If your dataset is too large for a single machine.
    • If you need parallel query processing.

5. Sharding + Replication (Together)

In practice, most large-scale systems use both:

  • Sharded cluster with replication inside each shard.
    • Each shard = subset of data.
    • Each shard has replicas for HA and read-scaling.

Example:

  • MongoDB Sharded Clusters.
  • Cassandra (replication + partitioning).
  • Google Spanner (sharding + synchronous replication).

6. Real-World Examples

  • Instagram → PostgreSQL with sharding + Memcached.
  • Twitter → Sharded MySQL for tweets.
  • Cassandra → Partitioned (sharded) data + replication for HA.
  • Spanner → Sharding + synchronous replication + global consistency.

7. Interview Tips

  • Clarify the bottleneck:

    • If reads → talk about replication.
    • If writes / dataset size → talk about sharding.
  • Mention trade-offs:

    • Replication lag.
    • Shard rebalancing & hotspots.
  • Bonus points:

    • Talk about combining both.
    • Mention CAP/PACELC links.

👉 Example Answer:
“If the system is read-heavy, I’d add replication to scale reads and ensure availability. If writes are the bottleneck or the dataset is too large for one server, I’d introduce sharding. In practice, I’d use both — a sharded cluster with replication inside each shard for HA.”


8. Recap

  • Replication = copies of the same data → improves availability & read scaling.
  • Sharding = splits data → improves write scaling & storage.
  • Both are complementary and often combined.

Next Steps

👉 Continue with Database Scaling Patterns for a broader view of scaling strategies beyond sharding and replication.


Connect: LinkedIn

© 2025 Official CTO. All rights reserved.