SD Replication and Data Consistency

Replication is the process of copying data from one database server (the primary) to one or more other servers (replicas). This ensures data availability, enables high-throughput reads, and provides fault tolerance. If the primary server fails, a replica can take over, preventing data loss or downtime.

Think of replication like publishing a bestselling book. One author (primary) writes the original book. Publishers (replicas) print thousands of copies worldwide. Readers (clients) pick up a copy from the nearest location. Even if the original manuscript burns, copies exist everywhere.

Why Replication Is Essential

High Availability: If the primary database server crashes, a replica promotes itself and keeps the system running.
Read Scaling: Read queries distribute across multiple replicas, dramatically increasing read throughput.
Geographic Distribution: Replicas placed near users in different regions reduce read latency.
Disaster Recovery: Data survives even if an entire data center goes offline.
Backup: Replicas serve as live backups without impacting primary performance.

Types of Replication

1. Single-Leader Replication (Master-Slave)

One server is the leader (primary/master). All writes go exclusively to the leader. The leader replicates changes to followers (replicas/slaves). Reads can go to any follower.

                  +-----------+
  All WRITES ---> |  Primary  |
                  | (Leader)  |
                  +-----------+
                   /    |    \
                  /     |     \
          +------+  +------+  +------+
          |Repl 1|  |Repl 2|  |Repl 3|
          +------+  +------+  +------+
            READ     READ      READ

Advantages:

Simple to understand and implement
Reads scale horizontally by adding more replicas
No write conflicts (one writer)

Disadvantages:

Leader is a write bottleneck
Followers may lag behind leader (replication lag)
Leader failure requires manual or automatic failover

2. Multi-Leader Replication

Multiple servers accept writes. Each leader replicates its writes to all other leaders and their followers. This enables writes from multiple data centers simultaneously.

Data Center A:                  Data Center B:
+----------+                    +----------+
| Leader A | <----replicate----> | Leader B |
+----------+                    +----------+
     |                               |
  Replicas                       Replicas
  (reads)                        (reads)

Users in USA write to Leader A
Users in Europe write to Leader B
Both leaders sync with each other

Advantage: Low-latency writes from any region.

Problem: Write conflicts — what happens when the same data is updated simultaneously on Leader A and Leader B?

3. Leaderless Replication

No single leader exists. Clients write to multiple replicas directly. As long as a majority of replicas confirm the write, it is considered successful. Amazon DynamoDB and Apache Cassandra use this model.

Write Request (W=2 out of 3):
Client → Replica 1 ✓
Client → Replica 2 ✓
Client → Replica 3 (unavailable)
→ 2/3 confirmations received = Write succeeds!

Read Request (R=2 out of 3):
Client reads from Replica 1 + Replica 2
Compares versions → Returns latest

Quorum reads and writes: W + R > N guarantees reading fresh data
(W = writes required, R = reads required, N = total replicas)

Synchronous vs Asynchronous Replication

Synchronous Replication

The primary waits for all replicas to confirm they received and stored the data before acknowledging the write to the client. Guarantees no data loss but slows down writes.

Client writes data
    ↓
Primary stores data
    ↓
Primary waits for Replica 1 ACK ✓
Primary waits for Replica 2 ACK ✓
    ↓
Primary sends success to client

Guarantee: If primary crashes, replica has same data.
Cost: Write latency increases by replica round-trip time.

Asynchronous Replication

The primary acknowledges the write immediately after storing locally. It replicates to followers in the background. Much faster writes but carries a risk — if the primary crashes before replication completes, the unsynced data is lost.

Client writes data
    ↓
Primary stores data
    ↓
Primary sends success to client (IMMEDIATELY)
    ↓ (background)
Primary replicates to replicas (eventually)

Speed: Fast writes (no waiting for replicas)
Risk: Small window of potential data loss if primary crashes

Aspect	Synchronous	Asynchronous
Write Speed	Slower (waits for replicas)	Faster (immediate acknowledgment)
Data Loss Risk	Zero (replicas always in sync)	Small window of potential loss
Availability	Lower (replica failure blocks writes)	Higher (primary works independently)
Use Case	Financial data, medical records	Social media posts, analytics events

Replication Lag

Replication lag is the delay between a write on the primary and that write appearing on the replica. In asynchronous replication, replicas are always slightly behind the primary. This creates read inconsistency problems.

Timeline:
T=0: User writes new profile photo → Written to Primary
T=1: User immediately reads profile → Gets routed to Replica
     Replica has not received update yet → Returns OLD photo!

This is called "Read Your Own Writes" problem.

Solutions for replication lag issues:

Read-Your-Own-Writes Consistency

Rule: After a write, route that user's reads to the primary for 1 minute
      After 1 minute, replica lag likely resolved → safe to read from replica

Monotonic Reads

Problem: User reads from Replica 1 (sees data), then Replica 2 (does not see data) → Data appears to disappear!

Solution: Route all reads for a given user to the same replica always
          (Session-based routing)

Failover and Leader Election

When the primary database fails, one of the replicas must be promoted to become the new primary. This process is called failover.

Normal State:
Primary ← accepts all writes
Replicas ← handle reads, replicate from primary

Primary crashes:
→ Monitoring detects failure (health check timeout)
→ Failover triggered

Automatic Failover:
→ Replicas vote on which one becomes new primary
→ Replica with most up-to-date data wins election
→ New primary starts accepting writes
→ Other replicas point to new primary
→ Application is notified of new primary address

Time to failover: typically 30 seconds to 2 minutes

Split-Brain Problem: If the primary is slow (not dead), it might still accept writes while a new primary was elected. Two primaries accepting writes simultaneously creates conflicting data. Solutions include Raft consensus protocol and distributed locking.

Data Consistency Models

Different replication configurations offer different consistency guarantees. Understanding these helps designers choose the right trade-off.

Strong Consistency

After a successful write, all subsequent reads return that write's data — immediately, from any node. The safest guarantee. Achieved by synchronous replication or directing all reads through the primary.

Write "x=5" to primary → Propagates synchronously to all replicas
Read from any replica → Always returns x=5 immediately

Eventual Consistency

After a write, replicas will eventually converge to the same value — but there is no guarantee of when. Reads might return stale data during the convergence window.

Write "x=5" to primary at T=0
Read from Replica at T=0.1s → Might return x=3 (old value, not yet synced)
Read from Replica at T=2s   → Returns x=5 (sync has completed)

Used by: DNS, Amazon S3, Cassandra in default configuration

Causal Consistency

If operation B causally depends on operation A (B came after A and could have seen A), then B is always seen after A by all nodes. Unrelated operations may be in any order.

Alice posts: "What is the weather today?" (Operation A)
Bob replies: "It's sunny!" (Operation B, causally follows A)

Causal consistency guarantees:
Anyone who sees Bob's reply also sees Alice's original post.
(No seeing a reply without the original message)

Read Replicas in Practice

Read replicas are the most common use of replication. The primary handles all writes, while multiple read replicas handle read queries. This pattern supports read-heavy workloads like social media feeds and reporting dashboards.

Write path:
Application → Primary DB (all writes)

Read path:
Application → Load Balancer → Read Replica 1 (report queries)
                           → Read Replica 2 (user feed queries)
                           → Read Replica 3 (analytics queries)

Primary: ~1,000 writes/second
Read Replicas (×3): ~30,000 reads/second total

Adding more read replicas scales read capacity linearly.

Summary

Replication copies data across multiple servers to enable fault tolerance, read scaling, and geographic distribution. Single-leader replication is the most common pattern — one primary accepts writes, followers handle reads. Asynchronous replication is faster but risks brief data loss. Replication lag creates temporary inconsistency that thoughtful routing strategies can minimize. The consistency model chosen (strong, eventual, causal) defines how the system behaves when different nodes temporarily hold different data.

Previous lessons

Back to courses

Next lessons