Proxies & Load Balancers
Proxies and load balancers are the traffic layer of a distributed system — the components that sit between clients and servers and decide how requests are intercepted, inspected, and routed. Understanding each one, and how they differ, is essential for designing systems that are scalable, secure, and observable.
| Component | Sits between | Primary job |
|---|---|---|
| Forward Proxy | Client → internet | Represents the client; controls outbound traffic |
| Reverse Proxy | Internet → servers | Represents the server; controls inbound traffic |
| Load Balancer | Internet → server pool | Distributes requests across multiple server instances |
| API Gateway | Clients → microservices | Single entry point with auth, routing, rate limiting |
Forward Proxy
A forward proxy (often just called a "proxy") sits between a client and the internet and makes requests on behalf of the client. From the destination server's perspective, the request comes from the proxy — the client's real IP address is hidden.
What a forward proxy does
- Anonymity — the origin server sees the proxy's IP, not the client's. This is the mechanism behind VPNs and tools like Tor.
- Content filtering and access control — corporate networks route all outbound traffic through a forward proxy to block specific domains, log activity, or enforce security policies. The proxy can inspect and reject requests that violate policy before they leave the network.
- Caching — a forward proxy can cache responses from the internet. If 500 employees all request the same software update, the proxy fetches it once and serves the cached copy to everyone, saving bandwidth.
- Geo-unblocking — by routing through a proxy in a different country, clients can access content that is geographically restricted.
Forward proxy vs VPN
Both hide the client's IP and route traffic through an intermediary. The key difference is scope and encryption: a VPN encrypts all traffic at the OS level (every application, every protocol); a forward proxy typically works at the HTTP/HTTPS level and requires explicit configuration per application. A VPN creates a private tunnel; a forward proxy acts as a web intermediary.
Where it appears in system design
Forward proxies are rarely part of the server-side architecture you design. They appear when your service needs to make outbound requests through a controlled egress point — for example, all requests from your backend to third-party APIs route through a single forward proxy so the third party can whitelist one IP instead of your entire fleet's IP range.
Reverse Proxy
A reverse proxy sits in front of one or more servers and forwards incoming client requests to them. From the client's perspective, it is talking directly to the service — the origin servers are completely hidden. The client sends a request to api.example.com; the reverse proxy receives it and decides which backend server handles it.
The distinction from a forward proxy is direction: a forward proxy represents the client to the outside world; a reverse proxy represents the server to the outside world.
What a reverse proxy provides
- SSL/TLS termination — the reverse proxy handles the HTTPS handshake and encryption. Backend servers communicate over plain HTTP on the private network, reducing their CPU overhead and centralising certificate management.
- Security and DDoS protection — hides origin server IP addresses, making them unreachable directly. Can enforce rate limiting, block malicious IPs, and act as a Web Application Firewall (WAF).
- Caching — caches responses from the origin and serves them directly for repeated requests, without forwarding to the backend at all.
- Compression — gzips or brotli-compresses responses before sending them to clients, reducing bandwidth without any changes to the application server.
- Request routing and virtual hosting — routes different paths or hostnames to different backend services.
/api/*goes to the application servers;/static/*goes to object storage or a CDN origin. Multiple domains can share a single reverse proxy. - Observability — centralised access logging, metrics, and tracing across all inbound requests before they reach application code.
Common implementations
| Tool | Typical use |
|---|---|
| Nginx | High-performance reverse proxy and static file server; extremely low memory footprint |
| HAProxy | Battle-tested L4/L7 proxy and load balancer; preferred for very high connection counts |
| Cloudflare | Global reverse proxy with CDN, DDoS protection, and WAF built in |
| AWS ALB / NLB | Managed L7 (ALB) and L4 (NLB) proxies tightly integrated with the AWS ecosystem |
| Envoy | Service-mesh sidecar proxy; used in Kubernetes with Istio for east-west traffic |
Load Balancer
A load balancer is a specialised reverse proxy whose primary job is to distribute incoming requests across a pool of server instances. It is the primary mechanism for horizontal scaling and fault tolerance — if one server fails its health check, the load balancer automatically stops sending it traffic.
Layer 4 vs Layer 7
- Layer 4 (transport layer) — routes by IP address and TCP/UDP port. Does not inspect the packet payload. Extremely fast; handles millions of connections per second. Suitable when routing decisions don't require knowledge of the request content (e.g. distributing raw TCP connections to a database cluster).
- Layer 7 (application layer) — inspects the full HTTP request: URL path, headers, cookies, and body. Enables content-based routing, SSL termination, sticky sessions, and request rewriting. Slightly higher overhead but far more powerful. Most web-facing load balancers are L7.
Routing algorithms
| Algorithm | How it works | Best for |
|---|---|---|
| Round-robin | Cycles through servers in order | Homogeneous servers, uniform request cost |
| Weighted round-robin | Servers with higher capacity receive a proportionally larger share | Heterogeneous server pools (mixed instance types) |
| Least connections | Sends to the server with the fewest active connections | Long-lived or variable-duration requests |
| IP hash | Hashes the client IP to deterministically select a server | Stateful sessions without a shared session store |
| Least response time | Sends to the server with the lowest average response time | Latency-sensitive APIs with variable backend performance |
Health checks
The load balancer continuously monitors each server by sending periodic probes — an HTTP request to /health, a TCP connection attempt, or a ping. A server that fails a configurable number of consecutive checks is removed from rotation automatically. When it recovers and passes checks again, it is added back. This enables:
- Zero-downtime deployments — drain a server, update it, wait for health checks to pass, then route traffic back to it. Repeat for each instance.
- Automatic failure recovery — a crashing instance is detected within seconds and removed without manual intervention.
Redundant load balancers
A single load balancer is itself a single point of failure. Production systems run load balancers in pairs using active-passive failover — a virtual IP (VIP) floats between the two; if the active one fails, the passive one takes over the VIP within seconds. Cloud-managed load balancers (AWS ALB, GCP Load Balancer) handle this transparently.
API Gateway
An API gateway is a managed reverse proxy that acts as the single entry point for all client requests to a microservice backend. Where a basic reverse proxy routes and forwards, an API gateway also enforces cross-cutting concerns that would otherwise be duplicated across every service.
What an API gateway handles
- Authentication and authorisation — validates JWTs or API keys on every incoming request, rejecting unauthenticated calls before they reach any service. Individual services trust the gateway and do not need their own auth logic.
- Rate limiting and throttling — enforces per-client, per-endpoint, or global request quotas. Prevents abuse and protects backend services from traffic spikes.
- Request routing — routes
/users/*to the user service,/orders/*to the order service, and so on — based on path, method, or headers. - Protocol translation — exposes a REST or GraphQL API to external clients while communicating with internal services over gRPC, which clients cannot call directly from a browser.
- Request/response transformation — reshapes payloads between client format and service format, aggregates responses from multiple services into one (BFF pattern), or strips internal fields before returning to the client.
- Observability — centralised logging, distributed tracing correlation, and metrics collection at the system boundary.
API Gateway vs Load Balancer vs Reverse Proxy
| Reverse Proxy | Load Balancer | API Gateway | |
|---|---|---|---|
| Primary job | Controlled forwarding gateway | Distribute load across instances | Enforce cross-cutting concerns |
| Auth / rate limiting | Possible but manual | No | First-class, built-in |
| Protocol translation | No | No | Yes (REST ↔ gRPC, etc.) |
| Response aggregation | No | No | Yes |
| Typical layer | Edge or internal | Edge | Edge — sits in front of all services |
| Examples | Nginx, HAProxy, Envoy | AWS ALB/NLB, GCP LB, HAProxy | AWS API Gateway, Kong, Apigee, Traefik |
In practice the boundaries blur — Nginx can do light auth, an ALB can do header-based routing, and Kong is both a reverse proxy and a full API gateway. What matters is understanding what responsibility you are assigning to each layer, not what the product is called.