Message Queues

A message queue is a durable buffer between a producer (the service that generates work) and a consumer (the service that processes it). The producer writes a message to the queue and moves on immediately — it does not wait for the consumer to finish. This decoupling is one of the most important patterns in distributed systems.

Why Queues

  1. Absorb traffic spikes — if 10,000 orders arrive in one second but the processing service can only handle 1,000/s, a queue buffers the excess. Without it, the processing service would be overwhelmed or requests would be dropped.
  2. Decouple services — the producer doesn't need to know about the consumer. Services can be deployed, restarted, or scaled independently without coordinating with each other.
  3. Enable retry logic — if a consumer crashes while processing a message, the message is re-queued and retried rather than lost. Dead-letter queues capture messages that fail repeatedly for manual inspection.
  4. Async processing — tasks that don't need to complete synchronously (sending emails, resizing images, generating reports) are pushed to a queue and processed in the background, keeping API response times fast.
  5. Fan-out — a single message can be consumed by multiple independent services simultaneously (notifications, analytics, audit logging) without the producer knowing about any of them.

Delivery Guarantees

  1. At-most-once — messages are delivered zero or one times. Fast but may lose messages on failure. Acceptable for metrics or logs where occasional loss is tolerable.
  2. At-least-once — messages are delivered one or more times. No message is ever lost, but duplicates are possible on retry. Consumers must be idempotent — processing the same message twice must produce the same result as processing it once.
  3. Exactly-once — messages are delivered exactly one time. The hardest guarantee to achieve; requires transactional semantics between the broker and the consumer's storage. Kafka supports this with transactions; for other systems it is typically simulated with idempotent consumer logic (deduplication keys).

Idempotency in practice

Because at-least-once is the practical default for most systems, consumers should be designed to be idempotent from the start. Common patterns:

  1. Idempotency key — each message carries a unique ID; the consumer checks a processed-IDs store before acting and skips duplicates.
  2. Upsert operations — write operations are designed so re-running them produces the same outcome (e.g. INSERT ... ON CONFLICT DO UPDATErather than a plain INSERT).
  3. Conditional updates — apply changes only if the current state matches the expected pre-condition (optimistic locking).

Kafka vs RabbitMQ vs SQS

KafkaRabbitMQSQS
ModelDistributed log; consumers read from offsetsPush-based broker; messages deleted on ACKManaged pull queue
RetentionConfigurable (days/weeks); replay possibleUntil consumed or TTLUp to 14 days
ThroughputMillions of messages/secThousands–hundreds of thousands/secScales automatically
OrderingPer-partition ordering guaranteedPer-queue FIFO (with single consumer)Standard: no guarantee; FIFO queue: yes
ReplayYes — consumers can re-read from any offsetNo — messages are deleted on ACKNo
Best forEvent streaming, audit logs, analytics pipelinesTask queues, complex routing, RPC patternsSimple async decoupling on AWS

When to use each

  1. Kafka — when you need to replay events, build event-sourced systems, fan out to many independent consumers, or process high-volume streams. The log model makes it the foundation for event-driven architectures.
  2. RabbitMQ — when you need flexible routing (topic exchanges, header routing), priority queues, or a simpler operational model than Kafka for moderate throughput task queues.
  3. SQS — when you are already in AWS and want a zero-ops managed queue. Use SQS Standard for maximum throughput; SQS FIFO when message ordering and exactly-once processing matter.