Availability & Scaling
Availability and scaling are two sides of the same coin. Scaling determines how much load your system can handle; the data centre topology you deploy into determines how resilient it is when things go wrong. Getting both right is the foundation of any production-grade system.
Scaling
Scaling is the process of increasing a system's capacity to handle more load. There are two fundamental approaches.
Vertical scaling (scale up)
Add more resources to the existing machine — more CPU cores, more RAM, faster storage. Simple to implement (no application changes required) but has a hard ceiling: there is a maximum size machine you can buy. It is also a single point of failure — one machine, one outage.
Horizontal scaling (scale out)
Add more machines to the pool and distribute load across them. Theoretically unlimited and provides redundancy — one machine failing does not bring down the service. The trade-off is complexity: the application must be stateless so any server can handle any request. Session state must live in a shared store (Redis, a database) rather than in server memory.
| Vertical | Horizontal | |
|---|---|---|
| Ceiling | Largest machine available | Effectively unlimited |
| Fault tolerance | Single point of failure | Redundant by design |
| Application changes | None | Must be stateless |
| Cost | Expensive at the top end | Commodity hardware, pay-as-you-scale |
Auto-scaling
Cloud platforms (AWS, GCP, Azure) can automatically add or remove instances based on metrics — CPU usage, request rate, queue depth. This matches capacity to demand without manual intervention, keeping costs low during quiet periods and maintaining performance during spikes. Stateless design is a prerequisite.
Stateless design
A stateless server holds no user-specific data in memory between requests. Any request can be routed to any instance, which is what makes horizontal scaling and auto-scaling possible. State is externalised to shared infrastructure:
- Sessions — stored in Redis or a database, not in server memory.
- Uploaded files — written to object storage (S3, GCS), not the local filesystem.
- Cached data — stored in a distributed cache (Redis), not in-process.
Data Centres
Cloud providers organise their infrastructure into a hierarchy of geographic units that directly affect a system's availability and latency characteristics.
Regions and Availability Zones
- Region — a geographic area (e.g. us-east-1, eu-west-2). Each region is an independent cluster of data centres. Choosing a region affects latency for your users, data residency compliance, and which services are available.
- Availability Zone (AZ) — one or more discrete data centres within a region, each with independent power, cooling, and networking. AZs within a region are connected by low-latency private links. Deploying across multiple AZs protects against a single data centre failure.
A service that runs in a single AZ can survive instance failures but goes down if the AZ experiences an outage. Spreading across three AZs means the service stays up even if one AZ is completely lost — this is the standard pattern for production workloads targeting high availability.
Active-active vs active-passive
- Active-active — all nodes serve traffic simultaneously. Maximises throughput and provides instant failover because no node is idle. Requires the application to handle concurrent writes across nodes, which introduces a complex consistency problem.
- Active-passive — one node serves all traffic; standby nodes are idle but ready to take over on failure. Simpler consistency model but wastes capacity and failover takes a few seconds (DNS update or health-check timeout).
Multi-region
Deploying across regions serves users with lower latency and provides disaster recovery if an entire region goes down. The challenge is data — keeping databases consistent across geographically separated regions introduces significant latency and consistency trade-offs. Most systems use a primary region for writes and replicate reads to secondary regions, accepting a small replication lag.
| Single region, multi-AZ | Multi-region | |
|---|---|---|
| Protects against | Data centre failure, instance failure | Full region outage, natural disasters |
| Latency benefit | None — same region | Serve users from nearest region |
| Data complexity | Low — replication within region is fast | High — cross-region consistency trade-offs |
| Cost | Moderate | High |