Availability & Scaling

Availability and scaling are two sides of the same coin. Scaling determines how much load your system can handle; the data centre topology you deploy into determines how resilient it is when things go wrong. Getting both right is the foundation of any production-grade system.

Scaling

Scaling is the process of increasing a system's capacity to handle more load. There are two fundamental approaches.

Vertical scaling (scale up)

Add more resources to the existing machine — more CPU cores, more RAM, faster storage. Simple to implement (no application changes required) but has a hard ceiling: there is a maximum size machine you can buy. It is also a single point of failure — one machine, one outage.

Horizontal scaling (scale out)

Add more machines to the pool and distribute load across them. Theoretically unlimited and provides redundancy — one machine failing does not bring down the service. The trade-off is complexity: the application must be stateless so any server can handle any request. Session state must live in a shared store (Redis, a database) rather than in server memory.

	Vertical	Horizontal
Ceiling	Largest machine available	Effectively unlimited
Fault tolerance	Single point of failure	Redundant by design
Application changes	None	Must be stateless
Cost	Expensive at the top end	Commodity hardware, pay-as-you-scale

Auto-scaling

Cloud platforms (AWS, GCP, Azure) can automatically add or remove instances based on metrics — CPU usage, request rate, queue depth. This matches capacity to demand without manual intervention, keeping costs low during quiet periods and maintaining performance during spikes. Stateless design is a prerequisite.

Stateless design

A stateless server holds no user-specific data in memory between requests. Any request can be routed to any instance, which is what makes horizontal scaling and auto-scaling possible. State is externalised to shared infrastructure:

Sessions — stored in Redis or a database, not in server memory.
Uploaded files — written to object storage (S3, GCS), not the local filesystem.
Cached data — stored in a distributed cache (Redis), not in-process.

Data Centres

Cloud providers organise their infrastructure into a hierarchy of geographic units that directly affect a system's availability and latency characteristics.

Regions and Availability Zones

Region — a geographic area (e.g. us-east-1, eu-west-2). Each region is an independent cluster of data centres. Choosing a region affects latency for your users, data residency compliance, and which services are available.
Availability Zone (AZ) — one or more discrete data centres within a region, each with independent power, cooling, and networking. AZs within a region are connected by low-latency private links. Deploying across multiple AZs protects against a single data centre failure.

A service that runs in a single AZ can survive instance failures but goes down if the AZ experiences an outage. Spreading across three AZs means the service stays up even if one AZ is completely lost — this is the standard pattern for production workloads targeting high availability.

Active-active vs active-passive

Active-active — all nodes serve traffic simultaneously. Maximises throughput and provides instant failover because no node is idle. Requires the application to handle concurrent writes across nodes, which introduces a complex consistency problem.
Active-passive — one node serves all traffic; standby nodes are idle but ready to take over on failure. Simpler consistency model but wastes capacity and failover takes a few seconds (DNS update or health-check timeout).

Multi-region

Deploying across regions serves users with lower latency and provides disaster recovery if an entire region goes down. The challenge is data — keeping databases consistent across geographically separated regions introduces significant latency and consistency trade-offs. Most systems use a primary region for writes and replicate reads to secondary regions, accepting a small replication lag.

	Single region, multi-AZ	Multi-region
Protects against	Data centre failure, instance failure	Full region outage, natural disasters
Latency benefit	None — same region	Serve users from nearest region
Data complexity	Low — replication within region is fast	High — cross-region consistency trade-offs
Cost	Moderate	High