CAP Theorem

What is CAP?

The CAP theorem (formally proven by Eric Brewer in 2000) states that a distributed data store can guarantee at most two of the following three properties at the same time:

  1. Consistency (C) — every read returns the most recent write or an error. All nodes in the cluster see the same data at the same time. A stale read is never returned to the client.
  2. Availability (A) — every request receives a response. The system never returns an error due to node failures, though the response may not reflect the latest write.
  3. Partition Tolerance (P) — the system continues to operate even when a network partition prevents some nodes from communicating with each other.

The critical insight is that partition tolerance is not optional. Network partitions — dropped packets, split-brain scenarios, datacenter link failures — happen in any real distributed system regardless of how well it is engineered. Choosing to sacrifice P means choosing to stop working whenever the network hiccups, which is not a viable production system.

The real choice is therefore always between CP and AP — between what the system does when a partition occurs.

During a partition…CP systemAP system
What happens to reads?Returns an error or blocks until consistency is restoredReturns data — possibly stale
What happens to writes?Rejected or queued until quorum is reachedAccepted on available nodes; reconciled later
Trade-offSacrifices availability to preserve correctnessSacrifices consistency to remain responsive

CP systems

A CP system prioritises consistency over availability. During a network partition, nodes that cannot confirm they have the latest data will refuse to serve requests — returning an error rather than risking a stale or incorrect answer. Once the partition heals and nodes re-synchronise, the system becomes available again.

CP systems are the right choice when correctness is non-negotiable. Financial transactions, inventory management, distributed locks, and leader election all require that every node agrees on the current state before proceeding. Serving a stale balance or granting two nodes the same lock would cause real harm.

  1. HBase — strong consistency via a single master coordinating region servers. Reads and writes block during partition recovery.
  2. Zookeeper / etcd — consensus-based coordination services used for distributed locks, leader election, and configuration. Use Raft or Paxos to guarantee that only a quorum-approved value is ever returned.
  3. Traditional RDBMS with synchronous replication — a write is only acknowledged after it has been committed to all replicas. The primary will stall or reject writes if it cannot reach a replica.
  4. Google Spanner — globally consistent via TrueTime; sacrifices latency (global synchronisation) to achieve external consistency across datacentres.

AP systems

An AP system prioritises availability over consistency. During a network partition, nodes continue serving requests using whatever data they currently have. When the partition heals, conflicting versions of data are reconciled — usually via last-write-wins, vector clocks, or application-level merge logic. This property is called eventual consistency: given enough time without new writes, all nodes will converge to the same value.

AP systems are the right choice when availability is critical and the application can tolerate briefly stale data. A social media feed, a product catalogue, or a shopping cart can show data that is a few seconds behind without causing harm. Refusing to serve any response at all — the CP trade-off — would be far worse for user experience.

  1. Cassandra — a peer-to-peer ring with no single master. Writes are accepted by any node and propagated asynchronously. Tunable consistency: you can request quorum reads (stronger) or single-node reads (faster) per query.
  2. DynamoDB — eventually consistent reads by default; strongly consistent reads are available at higher latency and cost. Highly available across multiple AZs.
  3. CouchDB — designed around eventual consistency with built-in conflict detection and resolution. Uses multi-version concurrency control (MVCC).
  4. DNS — the canonical AP system. Records propagate asynchronously across resolvers; different clients may receive different answers for minutes or hours after a change. The system never goes down to guarantee freshness.

CAP in practice

CAP describes a binary partition scenario, but real systems operate on a spectrum. Most modern databases expose knobs that let you tune the consistency/availability trade-off per operation, rather than fixing it for the entire system.

Cassandra's consistency levels are a good example. A write with QUORUM consistency requires a majority of replica nodes to acknowledge before returning success — trading some availability for stronger consistency. A write with ONE consistency acknowledges as soon as a single node writes it — maximum availability, weakest consistency. The application chooses per request.

When designing a system, the questions to ask are:

  1. What happens if a user reads stale data? — For a bank balance, it is unacceptable. For a social media like count, it is fine.
  2. What happens if a write is temporarily rejected? — A failed payment is catastrophic. A failed "add to wishlist" is a minor inconvenience.
  3. Can the application handle conflict resolution? — Shopping carts can be merged on the client side (Amazon's Dynamo paper describes exactly this). Financial ledgers cannot.

Different parts of the same system may make different choices. The payments service is CP; the product recommendation service is AP. CAP is not a system-wide label — it is a per-data-store, per-operation decision.

PACELC — the fuller picture

CAP only describes system behaviour during a network partition. But partitions are rare. What about normal operation? PACELC (Daniel Abadi, 2012) extends CAP to cover the latency/consistency trade-off that exists even when the network is healthy.

PACELC reads as two if-else branches:

  1. If there is a Partition (P) — choose between Availability (A) and Consistency (C). This is the original CAP choice.
  2. Else (E) — when there is no partition — choose between Latency (L) and Consistency (C). A system that replicates writes synchronously to all replicas before acknowledging is consistent but slow. One that acknowledges immediately and replicates asynchronously is fast but eventually consistent.
SystemDuring partitionDuring normal operationPACELC label
DynamoDB (default)AvailabilityLow latencyPA / EL
Cassandra (ONE)AvailabilityLow latencyPA / EL
HBaseConsistencyHigh consistency (slower)PC / EC
SpannerConsistencyHigh consistency (global sync)PC / EC
MySQL async replicationAvailabilityLow latencyPA / EL
MySQL sync replicationConsistencyHigher consistency (slower)PC / EC

PACELC makes explicit what CAP leaves implicit: the latency penalty of strong consistency is always present, not just during failures. This is why most internet-scale systems default to eventual consistency — the throughput and latency gains during normal operation are too large to sacrifice for a consistency guarantee that the application can often provide at a higher level anyway.