Back-Of-The-Envelope Estimation
Why it matters
Back-of-the-envelope estimation is the skill of quickly sizing a system using rough numbers and order-of-magnitude arithmetic. The goal is not precision — it is feasibility. You want to know whether a single database can handle the write load, whether the dataset fits in RAM, whether a CDN is necessary, and whether your architecture makes sense at the stated scale — all before writing a line of code or drawing a detailed diagram.
In a system design interview, estimation signals that you think at scale, not just at the happy path. In production engineering, it saves you from over-engineering solutions for problems you don't have and under-engineering for ones you do.
The process is always the same: state your assumptions explicitly, use round numbers, sanity-check the result against something you know, and adjust the architecture accordingly.
Key numbers to memorise
Powers of 2
Every storage and traffic number ultimately reduces to a power of 2. Knowing these cold means you can convert between units in your head.
| Power | Exact value | Approximate | Name |
|---|---|---|---|
| 2¹⁰ | 1,024 | ~1 thousand | 1 KB |
| 2²⁰ | 1,048,576 | ~1 million | 1 MB |
| 2³⁰ | 1,073,741,824 | ~1 billion | 1 GB |
| 2⁴⁰ | ~1.1 × 10¹² | ~1 trillion | 1 TB |
| 2⁵⁰ | ~1.1 × 10¹⁵ | ~1 quadrillion | 1 PB |
Latency numbers every engineer should know
These are rough orders of magnitude (Jeff Dean's numbers, lightly updated). The key intuition: RAM is ~1,000× faster than SSD, SSD is ~10× faster than HDD, and a cross-datacenter round trip is ~300,000× slower than an L1 cache read.
| Operation | Latency |
|---|---|
| L1 cache reference | 0.5 ns |
| L2 cache reference | 7 ns |
| Main memory (RAM) reference | 100 ns |
| Read 1 MB sequentially from RAM | 250 µs |
| SSD random read | 150 µs |
| Read 1 MB sequentially from SSD | 1 ms |
| HDD seek | 10 ms |
| Read 1 MB sequentially from HDD | 20 ms |
| Round trip within same datacenter | 0.5 ms |
| Send 1 MB over 1 Gbps network | 10 ms |
| Round trip US West → US East | 40 ms |
| Round trip US → Europe | 150 ms |
Data size quick reference
| Type | Size |
|---|---|
| char | 1 byte |
| int | 4 bytes |
| long / double | 8 bytes |
| UUID | 16 bytes |
| Timestamp (Unix) | 4–8 bytes |
| Short text (tweet, URL) | ~200–500 bytes with metadata |
| Thumbnail image | ~20 KB |
| Compressed JPEG (profile photo) | ~50–100 KB |
| High-quality photo | ~300 KB |
| 1-minute audio (compressed) | ~1 MB |
| 1-minute HD video (compressed) | ~50 MB |
Time conversions
The most useful conversion: there are roughly 86,400 seconds in a day. For estimation, round to 10⁵. This is the single most-used number in traffic calculations.
| Period | Seconds | Rounded |
|---|---|---|
| 1 minute | 60 | ~60 |
| 1 hour | 3,600 | ~4 × 10³ |
| 1 day | 86,400 | ~10⁵ |
| 1 month | 2,592,000 | ~2.5 × 10⁶ |
| 1 year | 31,536,000 | ~3 × 10⁷ |
Traffic estimation
Traffic estimation answers: how many requests per second must the system handle? Start from daily active users (DAU) and work down to queries per second (QPS).
The formula
QPS = (DAU × requests_per_user_per_day) / seconds_per_day
Peak QPS ≈ 2 × average QPS (conservative)
Peak QPS ≈ 3 × average QPS (for bursty workloads)Read/write ratio
Most real systems are read-heavy. Always split QPS into reads and writes — they drive different infrastructure decisions. Writes hit the primary database; reads can be served from replicas or caches.
# Example: a social feed with 100:1 read/write ratio
Total QPS = 10,000
Write QPS = 10,000 / (100 + 1) ≈ 100 writes/sec
Read QPS = 100 × 100 = 9,900 reads/secScale benchmarks
| Scale | DAU | Avg QPS (10 req/user/day) | Architecture hint |
|---|---|---|---|
| Small | 100K | ~10 QPS | Single server + single DB is fine |
| Medium | 10M | ~1,000 QPS | Load balancer, read replicas, basic cache |
| Large | 100M | ~10,000 QPS | Sharding, heavy caching, CDN, multiple regions |
| Massive | 1B | ~100,000 QPS | Full distribution, custom infrastructure |
Storage estimation
Storage estimation answers: how much disk space does the system need, and for how long?Always estimate for a multi-year horizon — five years is the standard in interviews.
The formula
Daily storage = writes_per_day × record_size
Total storage = daily_storage × retention_days
# Always add 20% overhead for indexes, replication, and metadataDistinguishing storage types
- Metadata (SQL/NoSQL) — text fields, IDs, timestamps. Usually small (hundreds of bytes per record). Scales in GB to TB range even for large services.
- Media / blobs (object storage) — images, video, audio. Sizes jump by orders of magnitude (KB to MB per record). A service with photo uploads almost certainly needs object storage (S3, GCS) and a CDN — the database is not the right place for binary blobs.
A common mistake is lumping both together. Separate them — they require completely different infrastructure and the media number often dwarfs the metadata number by 100× or more.
Bandwidth & memory
Bandwidth
Bandwidth tells you the data throughput the system must sustain — important for CDN sizing, network provisioning, and understanding egress costs.
Inbound bandwidth = write_QPS × average_request_size
Outbound bandwidth = read_QPS × average_response_sizeCache memory sizing
The 80/20 rule (Pareto principle) holds reliably in caching: roughly 20% of the data receives 80% of the traffic. Caching that 20% eliminates most database reads.
Cache size = read_QPS × seconds_per_day × avg_response_size × 0.20
# Sanity check: does this fit on a single Redis instance?
# A typical Redis server holds 10–100 GB comfortably.
# If the number is larger, plan for a Redis cluster.Replication factor
Always multiply your raw storage number by the replication factor before comparing it against infrastructure costs. A dataset stored with 3× replication costs 3× the disk. Standard factors: 3× for most databases, 2–3× for object storage (with erasure coding it can be lower).
Worked example: URL Shortener
A URL shortener converts a long URL into a short code (short.ly/abc123) and redirects visitors to the original. This is a read-heavy, write-light system.
Assumptions
- 500M new short URLs created per month → ~6M per day
- 100:1 read/write ratio
- Average URL length: 200 bytes; metadata per record: ~300 bytes total
- 5-year retention
Traffic
Write QPS = 6,000,000 / 86,400 ≈ 70 writes/sec
Read QPS = 70 × 100 = 7,000 reads/sec
Peak reads ≈ 7,000 × 2 = ~14,000 reads/secStorage
Records over 5 years = 6M/day × 365 × 5 = ~11B records
Storage (metadata) = 11B × 300 bytes = ~3.3 TB
(+ 20% overhead) ≈ 4 TBBandwidth
Inbound = 70 writes/sec × 300 bytes ≈ 21 KB/s (negligible)
Outbound = 7,000 reads/sec × 300 bytes ≈ 2.1 MB/s (manageable)Cache
Daily read volume = 7,000 reads/sec × 86,400 sec = ~600M reads/day
Cache top 20% = 600M × 0.20 × 300 bytes = ~36 GB
→ Fits comfortably on a single Redis instance (or small cluster).
Cache hit ratio should be very high — popular short codes are
accessed millions of times while the long tail is rarely used.Architecture signals
- 14K peak reads/sec — a single database can handle this, but add a read replica and Redis cache to reduce latency below 5 ms.
- 70 writes/sec — trivially handled by any relational database.
- 4 TB metadata over 5 years — a single PostgreSQL instance can hold this; sharding is not needed yet.
- High read/write ratio → cache is the single most impactful optimisation.
Worked example: Social Feed
A Twitter-like microblogging platform where users post short text messages and browse a personalised feed of posts from people they follow. This is a read-heavy, write-moderate, fan-out-heavy system.
Assumptions
- 300M DAU
- 20% of users post daily → 60M posts/day
- Average post: 200 bytes of text + 100 bytes metadata = 300 bytes
- 10% of posts include an image (average 300 KB compressed)
- Each user reads their feed 5× per day, loading 20 posts per load
- Average user has 200 followers
- 5-year retention for posts; media stored indefinitely in object storage
Traffic
# Writes (posts)
Write QPS = 60,000,000 / 86,400 ≈ 700 posts/sec
Peak write QPS ≈ 700 × 3 ≈ 2,100 posts/sec
# Reads (timeline)
Timeline requests/day = 300M users × 5 reads = 1.5B requests/day
Read QPS = 1,500,000,000 / 86,400 ≈ 17,000 reads/sec
Peak read QPS ≈ 17,000 × 2 ≈ 34,000 reads/sec
# Fan-out writes (push model: pre-write to each follower's feed cache)
Fan-out writes/sec = 700 posts/sec × 200 followers = 140,000 writes/sec
→ This is why Twitter-scale systems use hybrid push/pull fan-outStorage
# Text metadata
Posts over 5 years = 60M/day × 365 × 5 = ~110B posts
Text storage = 110B × 300 bytes = ~33 TB
(+ 20% overhead) ≈ 40 TB
# Media (images)
Image uploads/day = 60M posts × 10% = 6M images/day
Media storage/day = 6M × 300 KB = ~1.8 TB/day
Media over 5 years = 1.8 TB × 365 × 5 ≈ 3.3 PB
→ Text metadata → sharded relational or wide-column DB (Cassandra)
→ Media → object storage (S3/GCS) + CDN — the DB never sees imagesBandwidth
# Inbound (uploads)
Text writes = 700/sec × 300 bytes ≈ 210 KB/s
Image writes = 6M images / 86,400 sec × 300 KB ≈ 20 GB/s ← dominant
# Outbound (feed reads)
Per request = 20 posts × 300 bytes = 6 KB (text only)
Total text = 17,000 reads/sec × 6 KB ≈ 100 MB/s
Image views = much larger → offload entirely to CDNCache
# Hot posts (top 20% get 80% of reads)
Daily read volume = 17,000 reads/sec × 86,400 × 6 KB = ~8.8 TB reads/day
Cache top 20% = 8.8 TB × 0.20 = ~1.76 TB
→ Too large for a single Redis instance.
→ Redis Cluster across multiple nodes, or limit cache to
the top-N trending posts + each user's pre-computed feed.Architecture signals
- 34K peak read QPS — well beyond a single database; requires sharding + heavy caching.
- 140K fan-out writes/sec — pure push model is infeasible for celebrity accounts with millions of followers. Use a hybrid model: push to followers with< 10K followers; pull and merge for celebrity accounts at read time.
- 3.3 PB of media — object storage is mandatory; no relational database stores blobs at this scale economically.
- 1.8 TB/day image ingest — CDN with origin offload + async processing pipeline for resizing thumbnails (message queue → worker fleet → object storage).
- Cache is essential but too large for a single node — pre-compute timelines and store in distributed cache keyed by user ID.