CAP Theorem
What is CAP?
The CAP theorem (formally proven by Eric Brewer in 2000) states that a distributed data store can guarantee at most two of the following three properties at the same time:
- Consistency (C) — every read returns the most recent write or an error. All nodes in the cluster see the same data at the same time. A stale read is never returned to the client.
- Availability (A) — every request receives a response. The system never returns an error due to node failures, though the response may not reflect the latest write.
- Partition Tolerance (P) — the system continues to operate even when a network partition prevents some nodes from communicating with each other.
The critical insight is that partition tolerance is not optional. Network partitions — dropped packets, split-brain scenarios, datacenter link failures — happen in any real distributed system regardless of how well it is engineered. Choosing to sacrifice P means choosing to stop working whenever the network hiccups, which is not a viable production system.
The real choice is therefore always between CP and AP — between what the system does when a partition occurs.
| During a partition… | CP system | AP system |
|---|---|---|
| What happens to reads? | Returns an error or blocks until consistency is restored | Returns data — possibly stale |
| What happens to writes? | Rejected or queued until quorum is reached | Accepted on available nodes; reconciled later |
| Trade-off | Sacrifices availability to preserve correctness | Sacrifices consistency to remain responsive |
CP systems
A CP system prioritises consistency over availability. During a network partition, nodes that cannot confirm they have the latest data will refuse to serve requests — returning an error rather than risking a stale or incorrect answer. Once the partition heals and nodes re-synchronise, the system becomes available again.
CP systems are the right choice when correctness is non-negotiable. Financial transactions, inventory management, distributed locks, and leader election all require that every node agrees on the current state before proceeding. Serving a stale balance or granting two nodes the same lock would cause real harm.
- HBase — strong consistency via a single master coordinating region servers. Reads and writes block during partition recovery.
- Zookeeper / etcd — consensus-based coordination services used for distributed locks, leader election, and configuration. Use Raft or Paxos to guarantee that only a quorum-approved value is ever returned.
- Traditional RDBMS with synchronous replication — a write is only acknowledged after it has been committed to all replicas. The primary will stall or reject writes if it cannot reach a replica.
- Google Spanner — globally consistent via TrueTime; sacrifices latency (global synchronisation) to achieve external consistency across datacentres.
AP systems
An AP system prioritises availability over consistency. During a network partition, nodes continue serving requests using whatever data they currently have. When the partition heals, conflicting versions of data are reconciled — usually via last-write-wins, vector clocks, or application-level merge logic. This property is called eventual consistency: given enough time without new writes, all nodes will converge to the same value.
AP systems are the right choice when availability is critical and the application can tolerate briefly stale data. A social media feed, a product catalogue, or a shopping cart can show data that is a few seconds behind without causing harm. Refusing to serve any response at all — the CP trade-off — would be far worse for user experience.
- Cassandra — a peer-to-peer ring with no single master. Writes are accepted by any node and propagated asynchronously. Tunable consistency: you can request quorum reads (stronger) or single-node reads (faster) per query.
- DynamoDB — eventually consistent reads by default; strongly consistent reads are available at higher latency and cost. Highly available across multiple AZs.
- CouchDB — designed around eventual consistency with built-in conflict detection and resolution. Uses multi-version concurrency control (MVCC).
- DNS — the canonical AP system. Records propagate asynchronously across resolvers; different clients may receive different answers for minutes or hours after a change. The system never goes down to guarantee freshness.
CAP in practice
CAP describes a binary partition scenario, but real systems operate on a spectrum. Most modern databases expose knobs that let you tune the consistency/availability trade-off per operation, rather than fixing it for the entire system.
Cassandra's consistency levels are a good example. A write with QUORUM consistency requires a majority of replica nodes to acknowledge before returning success — trading some availability for stronger consistency. A write with ONE consistency acknowledges as soon as a single node writes it — maximum availability, weakest consistency. The application chooses per request.
When designing a system, the questions to ask are:
- What happens if a user reads stale data? — For a bank balance, it is unacceptable. For a social media like count, it is fine.
- What happens if a write is temporarily rejected? — A failed payment is catastrophic. A failed "add to wishlist" is a minor inconvenience.
- Can the application handle conflict resolution? — Shopping carts can be merged on the client side (Amazon's Dynamo paper describes exactly this). Financial ledgers cannot.
Different parts of the same system may make different choices. The payments service is CP; the product recommendation service is AP. CAP is not a system-wide label — it is a per-data-store, per-operation decision.
PACELC — the fuller picture
CAP only describes system behaviour during a network partition. But partitions are rare. What about normal operation? PACELC (Daniel Abadi, 2012) extends CAP to cover the latency/consistency trade-off that exists even when the network is healthy.
PACELC reads as two if-else branches:
- If there is a Partition (P) — choose between Availability (A) and Consistency (C). This is the original CAP choice.
- Else (E) — when there is no partition — choose between Latency (L) and Consistency (C). A system that replicates writes synchronously to all replicas before acknowledging is consistent but slow. One that acknowledges immediately and replicates asynchronously is fast but eventually consistent.
| System | During partition | During normal operation | PACELC label |
|---|---|---|---|
| DynamoDB (default) | Availability | Low latency | PA / EL |
| Cassandra (ONE) | Availability | Low latency | PA / EL |
| HBase | Consistency | High consistency (slower) | PC / EC |
| Spanner | Consistency | High consistency (global sync) | PC / EC |
| MySQL async replication | Availability | Low latency | PA / EL |
| MySQL sync replication | Consistency | Higher consistency (slower) | PC / EC |
PACELC makes explicit what CAP leaves implicit: the latency penalty of strong consistency is always present, not just during failures. This is why most internet-scale systems default to eventual consistency — the throughput and latency gains during normal operation are too large to sacrifice for a consistency guarantee that the application can often provide at a higher level anyway.