Skip to main content

Lesson 1: What Serverless Really Means

What this lesson covers

  • The shift from capacity planning to event-driven scaling
  • Three core architectural changes in serverless thinking
  • When serverless wins and when it struggles
  • A cloud-agnostic lens for serverless architecture

Read time: 10–12 minutes


What you'll learn

  1. Serverless as responsibility transfer: You stop managing servers; the provider manages infrastructure, security, and scaling.
  2. Billing model revolution: Pay for invocations and execution time, not provisioned capacity—a fundamental shift in cost structure and design.
  3. Mental model change: Think in events and functions, not instances and connections—this shapes every architectural decision.

Simple Explanation

What it is

Serverless is a way of running code where you upload a function and the platform decides when and where it runs. You do not manage servers, patch operating systems, or keep machines warm. You focus on the events that trigger your code and the data it reads or writes.

Why we need it

Most teams spend a lot of time guessing traffic, over-provisioning servers, and paying for idle capacity. Serverless turns that problem into a billing model that matches real usage and lets small teams ship without a large operations burden.

Benefits

  • Automatic scaling when traffic spikes without you changing capacity.
  • Pay-per-use costs that drop to near-zero when nothing runs.
  • Faster delivery because you can deploy small functions quickly.

Tradeoffs

  • Cold starts can add noticeable latency for infrequent traffic.
  • Stateless design is required, so state must live outside the function.
  • Always-on workloads may be cheaper on reserved servers or containers.

Real-world examples (architecture only)

  • Image upload: Storage event → Function → Thumbnail in storage.
  • Order processing: API request → Function → Database write → Event for notifications.
  • Scheduled cleanup: Timer → Function → Delete expired records.

Core Concept: Serverless as a Spectrum of Responsibility

Serverless is not a technology; it's a responsibility boundary. It answers the question: Who manages what?

Every computing model can be placed on a spectrum from "you manage everything" to "the provider manages everything":

On-Premises / Physical Servers

  • You manage: Hardware, BIOS, OS, networking, power, cooling
  • Provider manages: Nothing
  • Your responsibility: ~100%

IaaS (EC2, GCP Compute Engine)

  • You manage: OS, runtime, application, scaling policy, capacity planning
  • Provider manages: Hardware, power, networking fabric, hypervisor
  • Your responsibility: ~60%

PaaS (App Engine, Cloud Run, Heroku)

  • You manage: Application code, configuration
  • Provider manages: OS, runtime, scaling, infrastructure
  • Your responsibility: ~30%

FaaS (AWS Lambda, Google Cloud Functions)

  • You manage: Application code, permissions, monitoring hooks
  • Provider manages: Everything else (server, OS, runtime, scaling, capacity, patching)
  • Your responsibility: ~10%

The serverless label means: The provider abstracts away infrastructure entirely. You write functions. The platform invokes them.

This is not "no servers"—servers still exist. It means the provider owns the server lifecycle, and you stop thinking about servers as abstractions. You think about events, functions, and state.


The Billing Model Revolution: From Capacity to Consumption

Understanding serverless requires understanding how billing changed.

Traditional Billing (VMs, IaaS)

Monthly cost = (Hourly rate) × (Hours) × (Number of instances)

You pay for time, regardless of usage. A 2 vCPU instance running idle costs the same as one at 100% utilization.

Problem: You provision for peak load to handle traffic spikes. But average load is 20% of peak. You pay for 80% idle capacity.

Example:

  • Peak traffic: 200 requests/second (needs 10 instances)
  • Average traffic: 40 requests/second
  • You provision 10 instances, but only need 2
  • Result: 80% of your compute cost is for idle capacity

Serverless Billing (FaaS)

Monthly cost = (Invocation count) + (GB-seconds of execution)

You pay only for compute actually consumed.

Advantage: No idle cost. A bursty workload (0 requests, then 1000 requests) scales instantly without paying for provisioned but unused capacity.

Example (same workload as above):

  • Peak: 200 requests/second, 100ms each, 512MB allocated
  • Average: 40 requests/second, 100ms each, 512MB allocated
  • Cost: Only for the 40 (or 200) requests and their execution time—no capacity reservation

The Cost Flip

For workloads with variable traffic, serverless is 50–80% cheaper.

For workloads that run constantly, serverless is 2–5x more expensive (you hit the amortized cost ceiling faster).

This cost model has architectural implications:

  • Small, isolated workloads become economical (fine-grained services).
  • Always-on background tasks become expensive (consolidate them).
  • Bursty event handling becomes cheap (async processing flourishes).

Three Core Architectural Shifts

1. From Capacity Planning to Automatic Scaling

Traditional: You forecast peak load, provision servers, manage scaling rules. If you underestimate, requests queue. If you overestimate, you pay for idle capacity.

Serverless: The platform scales automatically. A sudden spike in traffic (from 10 requests/second to 1000) is handled instantly without configuration.

Implication:

  • You stop thinking about "How many servers do I need?"
  • You start thinking about "What is the acceptable latency for my function cold start?" (varies by runtime and configuration—measure with your workload)
  • You design assuming stateless functions that can scale to 1000s of parallel invocations

2. From Stateful Connections to Ephemeral Execution

Traditional: A server maintains state in memory—sessions, database connections, caches. The server is a persistent resource.

Serverless: Each function invocation is a fresh process. It starts, runs, exits. No persistent in-memory state between invocations.

Implication:

  • All state moves to external stores: databases, caches, object storage
  • Connection pooling works differently (or doesn't work at all—every invocation opens fresh connections)
  • Caching shifts from in-process to external (shared caches like Redis, DynamoDB)
  • Transaction design changes (you lose in-process ACID; you adopt eventual consistency)

3. From Careful Deployments to Instant Updates

Traditional: A deployment takes 5–15 minutes. Traffic gradually shifts to the new version. If something goes wrong, you do a rolling rollback.

Serverless: Upload new code, it runs immediately. Zero warm-up time. Rollback is instant (upload old code).

Implication:

  • You can deploy confidently and frequently (enables CI/CD velocity)
  • You need robust feature flags and observability (bad code gets to production faster)
  • Cold start latency becomes a real consideration for user-facing functions

The Mental Model: Event → Function → State

Serverless systems follow a single repeating pattern:

Event to state flow

That is the entire model. Everything flows through it:

  • Order placed (event) → Lambda (function) → DynamoDB (state) → Response
  • Image uploaded (event) → Cloud Function (function) → Cloud Storage + Firestore (state) → Response
  • Scheduled time (event) → Cloud Scheduler (trigger) → Cloud Function → Database update

When you understand this pattern, serverless design becomes systematic.


When Serverless Wins

Use serverless architecture when:

  • Traffic is unpredictable or bursty: Morning spike, evening quiet. Zero to 1000 requests/second. Serverless scales with demand—you don't pay for idle time.
  • Average utilization is <30% of peak capacity: You have obvious idle time. Serverless eliminates that waste.
  • Workloads are independent: Ten different microservices, each with its own scaling needs. Serverless handles per-service auto-scaling.
  • Cold start is acceptable: Your SLA tolerates variable startup latency (measure and profile with your workload).
  • Functions complete within platform timeouts: Time limits vary by provider and generation. Longer jobs need different tools.
  • You value deployment speed: New code runs instantly. Rollback is instantaneous.

Example: Image Thumbnail Generation

  • Event: User uploads image
  • Function: Lambda converts to three thumbnail sizes
  • Execution: 2–3 seconds
  • Frequency: 1–50 uploads/day (highly variable)
  • Cost on VM: $50–100/month (running 24/7 with 99% idle time)
  • Cost on serverless: $2–8/month (pay only for execution time)

When Serverless Struggles

Do not use serverless when:

  • Workload is always-on: A service running 24/7, constantly processing. Once you factor in invocation overhead and cold starts, reserved capacity is cheaper.
  • Function needs very low latency: Cold start can be non-trivial. If your SLA is strict, use containers or reserved capacity.
  • Execution is long-running: Functions have maximum duration limits. Multi-hour jobs need other compute models.
  • Cost per GB-hour is cheaper reserved: At extreme scale (millions of GB-seconds/month), reserved capacity amortizes cheaper than pay-per-use.
  • You need deep runtime customization: Serverless platforms have limited languages and versions. If you need a specific C++ library or kernel module, you need VMs.
  • State is heavyweight: If your function needs to load 10GB of data into memory on every invocation, cold start and memory costs become prohibitive.

Example: Real-Time Trading System

  • Requirement: Sub-100 ms latency on order placement
  • Workload: 24/7, always processing market data
  • State: Maintains in-memory order books (GB-scale)
  • Solution: Reserved containers or VMs (not serverless)

AWS Lambda vs Google Cloud Functions: Conceptual Parity

Both platforms provide serverless compute. Both follow the same pattern. The main differences are operational and cost-based, not conceptual.

AspectAWS LambdaGoogle Cloud Functions
Trigger sourcesBroad set of event sources (see AWS docs).Broad set of event sources (see GCP docs).
Cold startVaries by runtime and configuration; measure for your workload.Varies by runtime and configuration; measure for your workload.
Maximum timeoutSee AWS Lambda limits.See Google Cloud Functions limits.
Memory rangeSee AWS Lambda limits (varies by region/runtime).See Google Cloud Functions limits.
State storeDynamoDB (native integrations)Firestore, Spanner, Memorystore
Event routingSNS, SQS, EventBridge (rich decoupling)Pub/Sub (simpler, pure publish-subscribe)
Cost modelSee AWS Lambda pricing.See Google Cloud Functions pricing.

For architectural thinking: These differences don't change the design pattern. Both are serverless. Both require stateful functions, event-driven design, and external state management.


Signals You're Thinking About Serverless Correctly

If you find yourself asking these questions, you're building serverless-native systems:

  • "How do I decompose this into independent functions that can scale separately?"
  • "Where does state live outside the function?"
  • "What happens if this function times out or fails halfway through?"
  • "How do I make this function idempotent (safe to retry)?"
  • "What's the cold start impact on user experience?"

The opposite (wrong approach):

  • "Can I run a Flask app in Lambda?" (Technically possible, but you're fighting the model.)
  • "How do I keep a database connection warm across invocations?" (You don't—each invocation is fresh.)
  • "This needs 10GB of state in memory" (Use a stateful service, not serverless.)

Cloud-Agnostic Architecture Lens

Serverless design should be portable in concept, even when the services differ. Use this mental model to stay cloud-agnostic:

  1. Event source: HTTP, queue, object storage, schedule
  2. Compute: Function runtime (short-lived)
  3. State: External store (database, cache, object storage)
  4. Orchestration: Workflow engine or event choreography
  5. Observability: Logs, metrics, traces

If you can describe each part in vendor-neutral terms, you can implement it on AWS, GCP, or any other cloud.


Python Example: Event → Function → State (Cloud-Agnostic)

Below is a Python-style example that shows the shape of a serverless function handling an event, validating input, and writing to a data store. The APIs are generic so the pattern stays cloud-agnostic.

from typing import Dict, Any

def handle_event(event: Dict[str, Any], store) -> Dict[str, Any]:
"""Process an incoming event and persist normalized data."""
# 1) Validate and normalize input
payload = event.get("payload", {})
user_id = payload.get("user_id")
action = payload.get("action")
if not user_id or not action:
return {"status": "error", "message": "Missing user_id or action"}

record = {
"user_id": user_id,
"action": action,
"timestamp": event.get("timestamp"),
}

# 2) Write to external state (database, cache, or object store)
store.put(record)

# 3) Return a response (HTTP or event-ack)
return {"status": "ok", "record": record}

What this does and why it helps:

  • It treats the event as the source of truth, which keeps the function stateless.
  • It pushes state out to a durable store so any retry is safe.
  • It works across clouds because the contract is the same: event in → state write → response out.

Practice: Project (Cloud-Agnostic)

Design a serverless event ingestion pipeline for a fictional product:

  • Input: User actions (clicks, purchases, page views)
  • Processing: Validate and enrich events
  • Storage: Write to a durable store
  • Output: A daily summary for analytics

Deliverables:

  1. Describe the architecture in vendor-neutral terms (event source, compute, state, orchestration, observability).
  2. Choose a cloud and map each component to services (AWS, GCP, or another).
  3. Explain why you chose each service based on performance, cost, and operational simplicity.

If you want feedback, email your write-up to maarifaarchitect@gmail.com.


References

Common Mistakes

  1. Treating serverless as a cost reducer: Serverless is cheaper for bursty, variable traffic. It's more expensive for always-on workloads. Choose based on traffic pattern, not buzzwords.

  2. Ignoring cold start in user-facing functions: A 200ms cold start doesn't sound bad until users perceive a 200ms delay on every first interaction. Measure and design around it.

  3. Forgetting idempotency: Serverless platforms retry failed invocations. If your function isn't idempotent, retries cause duplicates (double-charges, duplicate entries). Design for retries from day one.

  4. Coupling functions too tightly: If Function A must wait for Function B synchronously, you've lost the scaling benefit. Keep function dependencies asynchronous and loose.

  5. Overprovisioning memory to get CPU: Serverless pricing ties CPU to memory allocation. If you allocate 3GB RAM to get 2 vCPU, you're paying for unused memory. Use memory right-sizing tools.


What Comes Next

With this foundation—understanding that serverless shifts from planning capacity to triggering functions asynchronously—you're ready to explore:

  • Lesson 2: Event-driven thinking (how to decompose systems into producers and consumers)
  • Lesson 3: Stateless versus stateful (designing data flows around ephemeral functions)
  • Lesson 4: Loose coupling principles (building systems that scale independently)

Key Takeaway: Serverless is a contract: you write functions, the platform scales them. Cost shifts from provisioned capacity to consumption. Architecture shifts from stateful monoliths to distributed event-driven systems.

Think of infrastructure as a hierarchy of ownership decisions:

Shared responsibility model

Serverless doesn't mean "no servers." It means the cloud provider manages the servers completely. Your job is to write functions and let the platform handle provisioning, scaling, and capacity.


The Billing Unit Revolution

Understanding serverless requires understanding how billing shifted:

Traditional Models (IaaS/VMs)

Cost = (Hourly Rate) × (Hours Provisioned) × (Number of Instances)

You pay for time, regardless of usage. A 2 vCPU instance costs the same whether it processes 1 request or 1 million.

Problem: You're paying for idle time. If you provision for peak load and traffic is 10% of peak, you're paying 90% for capacity you don't use.

Serverless Model (FaaS)

Cost = (Price per GB-second) × (Memory Allocated) × (Execution Time) + (Number of Invocations)

You pay for actual compute consumed, not provisioned capacity.

Implication: A function that runs for 100ms on 512MB RAM costs proportionally less than one running 10 seconds. Idle time costs nothing.

Cost Comparison Example

Assume:

  • Traditional VM: 2 vCPU, 4GB RAM on EC2 = $200/month
  • Serverless function: 512MB RAM, 100ms average duration, 10M invocations/month

Traditional: $200/month (fixed, regardless of usage)

Serverless:

  • Compute: (0.1 seconds) × (10M invocations) × ($0.0000166/GB-second) × (0.5GB) = $83
  • Invocations: (10M) × ($0.0000002) = $2
  • Total: ~$85/month

For an underutilized workload, serverless is 50-70% cheaper. But this flips at scale—we'll cover that in Lesson 7.


Architectural Implications: How Serverless Changes Your Mental Model

1. The Concurrency Model Changes

In traditional architectures, you think about instances:

  • "We have 10 servers handling requests"
  • "Each server has 4 worker threads"
  • "At peak, we spin up 50 servers"

In serverless, you think about function invocations:

  • "Each event triggers a fresh function instance"
  • "The platform handles concurrency automatically"
  • "We have account-level concurrency limits (usually 1000s)"

System Design Impact: You can't hold connection pools or caches in memory across invocations. Each invocation is ephemeral.

2. State Must Live Elsewhere

Traditional architecture: Traditional stateful flow

Serverless architecture: Serverless stateless flow

Example: A user login flow in Traditional vs Serverless

# TRADITIONAL (Python on VM)
user_cache = {} # In-memory cache, persists across requests


def get_user(user_id):
if user_id in user_cache:
return user_cache[user_id] # Cache hit

user = db.query(user_id)
user_cache[user_id] = user
return user
# SERVERLESS (AWS Lambda)
# No in-memory cache possible (each invocation is fresh)
import json


def handler(event, context):
user_id = event["pathParameters"]["id"]
# Must query database or use ElastiCache on every invocation
user = dynamodb.get_item(Key={"id": user_id}).get("Item")
return {
"statusCode": 200,
"body": json.dumps(user)
}

This fundamental difference shapes everything downstream—database design, caching strategy, connection management.

3. Pricing Changes Your Architecture

Because compute cost is directly tied to execution time:

  • Optimize for speed, not resource utilization
  • Minimize critical path: Remove unnecessary processing
  • Cache results aggressively: Use CloudFront, ElastiCache, API response caching
  • Batch small operations: 1000 operations in 50ms beats 1000 separate invocations

AWS Lambda vs Google Cloud Functions: Core Differences

While conceptually similar, AWS Lambda and Google Cloud Functions make different architectural assumptions:

AspectAWS LambdaGoogle Cloud Functions
Cold Start (512MB)See AWS Lambda limitsSee Google Cloud Function limits
Minimum Billable DurationSee AWS Lambda pricingSee Google Cloud Functions pricing
Concurrency ModelSee AWS Lambda limitsSee Google Cloud Function limits
Supported Runtimes(Python, Go, Java, Ruby, .NET,,,)(Python, Go, Java, Ruby, PHP, .NET, C#,,)
Invocation ModelPull-based (polling) and push (events)Native event subscriptions
NetworkingVPC supportVPC Connector
Pricing ModelPer GB-second + invocations + networkingPer GB-second + invocations (simpler)

Example: AWS Lambda with Reserved Concurrency

# AWS Lambda - can reserve concurrency for predictable workloads
# This guarantees your function gets X concurrent executions
# Surplus traffic waits or is throttled
import json
import time


def handler(event, context):
# This function has reserved concurrency of 100
# Means: 100 concurrent invocations guaranteed
# Beyond that: queue or throttle
result = process_event(event)
return {
"statusCode": 200,
"body": json.dumps({"success": True, "result": result})
}


def process_event(event):
# Processing logic
time.sleep(0.1)
return {"processed": True}

Example: Google Cloud Functions - Automatic Concurrency

# Google Cloud Functions - no reserved concurrency concept
# Instead: parallelism per instance (default 80 concurrent requests per function instance)
import time


def handler(request):
# Automatically parallelized across function instances
# No reservation needed; scales based on traffic
event_data = request.get_json(silent=True) or {}
result = process_event(event_data)
return {"success": True, "result": result}


def process_event(event_data):
# Processing logic
time.sleep(0.1)
return {"processed": True}

This difference matters for predictable, bursty workloads. AWS offers more control; GCP offers simpler defaults.


The Responsibility Shift: What Changes vs What Doesn't

What Serverless Removes From Your Plate

TraditionalServerless
Server selection (CPU, RAM, disk)✗ Provider chooses memory tier
OS patches✗ Patched automatically
Scaling configuration (ASGs, load balancers)✗ Automatic based on traffic
Container orchestration✗ Hidden from you
Infrastructure monitoring✗ Built-in CloudWatch/Stackdriver

What You Still Own (This Is Critical)

AreaYour Responsibility
Application LogicWriting correct code
SecurityIAM roles, permissions, secrets management
Data ConsistencyHandling eventual consistency, idempotency
Cost ControlMonitoring execution time, memory allocation, invocation rates
ObservabilityLogs, metrics, distributed tracing
Timeout ConfigurationSetting realistic max execution time
Concurrency LimitsUnderstanding and managing concurrent invocations
Architecture DecisionsChoosing when/where to use serverless vs other models

Architectural Decision Framework: When This Concept Matters Most

Real World Example: Photo Processing Pipeline

A SaaS platform needs to process user-uploaded images (resize, watermark, compress).

Traditional Approach:

  • 3 EC2 instances (m5.xlarge) running image processors
  • Auto-scaling group (2-10 instances based on CPU)
  • Cost: $50-500/month depending on load
  • Utilization: 20-80% (paying for idle capacity)

Serverless Approach:

  • Lambda function triggered by S3 upload event
  • Runs only when images are uploaded
  • Cost: $2-50/month depending on image volume
  • Utilization: Near 100% (pay only when processing)

The serverless option works because:

  1. Workload is intermittent (doesn't need always-on servers)
  2. Execution is short (images process in 1-5 seconds)
  3. No state carries across invocations (each image is independent)

Core Concepts Recap

Serverless Means:

  1. Compute abstraction: You write functions, platform handles servers
  2. Invocation-based billing: Pay for what you use, not what you provision
  3. Automatic scaling: Traffic spikes handled without configuration
  4. Ephemeral execution: No persistent in-memory state
  5. Managed infrastructure: Updates, patching, capacity planning all automatic

The Trade-Off:

  • Gain: Operational simplicity, cost efficiency at scale-down, faster time-to-market
  • Cost: Loss of control, different programming patterns, potential lock-in, complexity at scale-up

When To Use This Model

Use serverless architecture when:

  • Request patterns are unpredictable or bursty
  • Average utilization is <30% of peak capacity
  • You have multiple isolated workloads (better to scale independently)
  • Startup time (cold start) is acceptable for your SLA
  • Functions complete in <15 minutes

Signals you're thinking about this right:

  • You're designing around events, not servers
  • You're thinking about concurrency limits, not instance counts
  • You're considering external databases, not in-process caches
  • Your billing conversation is about function duration, not compute capacity

When NOT to Use Serverless

Don't use serverless when:

  • Workload is continuously running (consider containers)
  • Functions need >15 minute execution time (complexity increases)
  • You need sub-50ms cold start latency (use containers/VMs)
  • Cost per GB-hour is cheaper on reserved capacity (at extreme scale)
  • You require extensive customization of runtime environment

Next Steps

With this foundation—understanding that serverless shifts from provisioning servers to responding to events—we're ready to explore:

  1. Event-Driven Architecture (Lesson 2): How to decompose systems into producers and consumers
  2. Stateless Design (Lesson 3): Why functions must be ephemeral and where state goes
  3. Loose Coupling (Lesson 4): Communication patterns that enable independent scaling

The architecture you're about to build isn't harder than traditional systems—it's different. The difference is deliberate and backed by the economics of the cloud.