Deconstructing Serverless: API Gateways, Cold Starts, and Stateless Functions

Learn how modern system design replaces traditional load balancers with serverless architecture to handle unpredictable internet traffic.

Arslan Ahmad

Feb 24, 2026

∙ Paid

This blog will cover:

Traditional web server limits
Serverless computing core concepts
Event-driven scaling mechanisms
Stateless system design rules
Solving database connection limits

Building software systems that handle massive internet traffic introduces a fundamental hardware problem.

A computer application running on physical or virtual hardware has strict processing limits. When a sudden surge of network requests arrives, active servers quickly run out of memory.

The system becomes completely overwhelmed and stops responding to new requests.

This failure results in severe software outages and broken infrastructure.

To prevent these outages, engineering teams historically kept excess servers running continuously. They paid for idle computing power to ensure capacity was always available for unexpected surges.

This approach creates massive financial waste because machines sit completely unused during quiet periods.

Solving this discrepancy between fixed computing capacity and unpredictable network traffic is a central challenge in modern software engineering.

Mastering the architectural shift from static servers to dynamic cloud infrastructure is absolutely critical. It is a mandatory concept for understanding modern system design.

The Traditional Architecture Baseline

To understand modern cloud patterns, one must first examine how standard web applications operate.

The traditional system design approach relies heavily on dedicated machines running continuously. Developers deploy their application code onto a Virtual Machine.

This is an isolated software environment that behaves exactly like a physical computer.

The application code remains active in the background at all times. It constantly listens on a specific network port for incoming data payloads.

When network traffic is low, a single virtual machine handles all incoming requests effortlessly. However, as traffic grows, a single machine will eventually hit its maximum hardware capacity.

When a single machine reaches its processing limit, the system must scale to handle the workload. The standard approach is called Horizontal Scaling.

This strategy involves adding entirely new virtual machines to the existing network cluster. By distributing the application code across multiple machines, the overall processing power increases.

However, adding new virtual machines is a surprisingly slow technical process.

A monitoring service must first detect that the active processor usage is dangerously high. The cloud provider must allocate hardware, boot an operating system, and establish database connections. This initialization sequence easily takes several minutes to complete.

If network traffic spikes instantly, existing servers crash long before new servers finish booting. This slow reaction time makes horizontal scaling inefficient for volatile traffic patterns. It leaves systems vulnerable to sudden bursts of user activity.

The Role of the Load Balancer

Running multiple virtual machines requires a mechanism to distribute incoming network traffic.

This is where the Load Balancer becomes a mandatory architectural component.

A load balancer is a dedicated network device that sits directly in front of the application servers. It acts as the single entry point for every incoming network request.

When a data packet arrives, the load balancer inspects it and forwards it to an available backend server. It uses specific mathematical algorithms to make this routing decision.

A popular method is the round robin algorithm.

This algorithm routes the first request to the first server, the second to the second server, and continues sequentially.

Another routing method checks the active workload of each machine.

The load balancer identifies which server currently has the fewest active network connections. It then forwards new traffic to that specific machine. This ensures an even distribution of processing work across the entire server cluster.

The Shift to Serverless Architecture

The financial cost of idle servers and the slow speed of horizontal scaling drove the software industry to seek better solutions. Engineers needed a way to execute code without managing underlying operating systems.

This technical requirement led to the creation of the serverless computing model. It fundamentally changes how cloud infrastructure operates.

The term Serverless Computing can be slightly confusing for beginners. Physical computers still exist in massive cloud data centers. The software code must still run on actual hardware processors.

The term simply means that the software developer is no longer responsible for managing those servers.

The cloud provider takes total control of the hardware allocation. The developer only writes the specific application logic and uploads it to the cloud.

The provider guarantees that the code will execute exactly when needed. They handle all backend provisioning automatically.

Function as a Service

The most common implementation of this architecture is known as Function as a Service.

In this model, developers break their monolithic applications down into tiny isolated blocks of code. These individual blocks are called functions. Each function handles one highly specific technical task.

Keep reading with a 7-day free trial

Subscribe to System Design Nuggets to keep reading this post and get 7 days of free access to the full post archives.