Message Queues
A message queue is a durable buffer between a producer (the service that generates work) and a consumer (the service that processes it). The producer writes a message to the queue and moves on immediately — it does not wait for the consumer to finish. This decoupling is one of the most important patterns in distributed systems.
Why Queues
- Absorb traffic spikes — if 10,000 orders arrive in one second but the processing service can only handle 1,000/s, a queue buffers the excess. Without it, the processing service would be overwhelmed or requests would be dropped.
- Decouple services — the producer doesn't need to know about the consumer. Services can be deployed, restarted, or scaled independently without coordinating with each other.
- Enable retry logic — if a consumer crashes while processing a message, the message is re-queued and retried rather than lost. Dead-letter queues capture messages that fail repeatedly for manual inspection.
- Async processing — tasks that don't need to complete synchronously (sending emails, resizing images, generating reports) are pushed to a queue and processed in the background, keeping API response times fast.
- Fan-out — a single message can be consumed by multiple independent services simultaneously (notifications, analytics, audit logging) without the producer knowing about any of them.
Delivery Guarantees
- At-most-once — messages are delivered zero or one times. Fast but may lose messages on failure. Acceptable for metrics or logs where occasional loss is tolerable.
- At-least-once — messages are delivered one or more times. No message is ever lost, but duplicates are possible on retry. Consumers must be idempotent — processing the same message twice must produce the same result as processing it once.
- Exactly-once — messages are delivered exactly one time. The hardest guarantee to achieve; requires transactional semantics between the broker and the consumer's storage. Kafka supports this with transactions; for other systems it is typically simulated with idempotent consumer logic (deduplication keys).
Idempotency in practice
Because at-least-once is the practical default for most systems, consumers should be designed to be idempotent from the start. Common patterns:
- Idempotency key — each message carries a unique ID; the consumer checks a processed-IDs store before acting and skips duplicates.
- Upsert operations — write operations are designed so re-running them produces the same outcome (e.g.
INSERT ... ON CONFLICT DO UPDATErather than a plainINSERT). - Conditional updates — apply changes only if the current state matches the expected pre-condition (optimistic locking).
Kafka vs RabbitMQ vs SQS
| Kafka | RabbitMQ | SQS | |
|---|---|---|---|
| Model | Distributed log; consumers read from offsets | Push-based broker; messages deleted on ACK | Managed pull queue |
| Retention | Configurable (days/weeks); replay possible | Until consumed or TTL | Up to 14 days |
| Throughput | Millions of messages/sec | Thousands–hundreds of thousands/sec | Scales automatically |
| Ordering | Per-partition ordering guaranteed | Per-queue FIFO (with single consumer) | Standard: no guarantee; FIFO queue: yes |
| Replay | Yes — consumers can re-read from any offset | No — messages are deleted on ACK | No |
| Best for | Event streaming, audit logs, analytics pipelines | Task queues, complex routing, RPC patterns | Simple async decoupling on AWS |
When to use each
- Kafka — when you need to replay events, build event-sourced systems, fan out to many independent consumers, or process high-volume streams. The log model makes it the foundation for event-driven architectures.
- RabbitMQ — when you need flexible routing (topic exchanges, header routing), priority queues, or a simpler operational model than Kafka for moderate throughput task queues.
- SQS — when you are already in AWS and want a zero-ops managed queue. Use SQS Standard for maximum throughput; SQS FIFO when message ordering and exactly-once processing matter.