Skip to main content

Handling Bursty Workloads

The Problem

Normal traffic: 10 requests/second Black Friday: 10,000 requests/second (suddenly)

Traditional servers: Fail, get slammed, go down.

Serverless: Scales automatically, but you might hit limits.


Simple Explanation

What it is

Bursty workloads are traffic spikes that appear suddenly, like flash sales or viral events.

Why we need it

If you do not plan for bursts, your system can throttle or fail at the exact moment users care most.

Benefits

  • Stable performance during spikes.
  • Reduced failures by buffering or rate limiting.
  • Predictable user experience even under load.

Tradeoffs

  • Extra infrastructure like queues and caches.
  • Potential delays if you buffer too much work.

Real-world examples (architecture only)

  • Flash sale -> Queue + worker functions -> Steady processing.
  • Viral post -> CDN + cache -> Lower backend load.

Burst Capacity

Lambda has burst capacity: Can handle 3,000 concurrent invocations instantly.

After that, scales up gradually:

Second 0: 3,000 concurrent
Second 1: 6,000 concurrent
Second 2: 9,000 concurrent
...growth continues

Burst capacity buys you time to scale smoothly.

Queue-Based Architecture

Instead of direct API calls, use queues:

Spike in traffic

SQS queue buffers requests

Lambda processes at steady rate

No throttling, no 503 errors

Users wait in queue (better than fail)

Implementation

APIGateway:
# Fast, returns immediately

SQS:
Type: AWS::SQS::Queue
Properties:
VisibilityTimeout: 300

ProcessingFunction:
# Processes queue at steady rate
Events:
SQSEvent:
Type: SQS
Properties:
Queue: !GetAtt Queue.Arn
BatchSize: 10

DynamoDB Throttling

During burst, DynamoDB might throttle if you set provisioned capacity:

# ❌ Provisioned: Fixed 100 writes/sec
ItemsTable:
ProvisionedThroughput:
WriteCapacityUnits: 100

# ✅ On-Demand: Unlimited writes
ItemsTable:
BillingMode: PAY_PER_REQUEST

Use on-demand for apps expecting bursts.

API Rate Limiting

Prevent abuse during bursts:

UsagePlan:
Type: AWS::ApiGateway::UsagePlan
Properties:
ApiStages:
- ApiId: !Ref MyApi
Throttle:
BurstLimit: 5000 # Handle spike
RateLimit: 2000 # Sustained
Quota:
Limit: 100000000 # Daily limit
Period: DAY

Clients exceeding rate get 429.

Graceful Degradation

When near capacity, degrade service instead of failing:

CAPACITY_THRESHOLD = 0.8  # 80% full


def handle_request(event):
metrics = get_lambda_metrics()
utilization = metrics["concurrentExecutions"] / metrics["limit"]

if utilization > CAPACITY_THRESHOLD:
if "x-priority" not in (event.get("headers") or {}):
return {"statusCode": 503, "body": "Service busy"}

return process_normally(event)

Circuit Breaker

Stop calling overloaded services:

import time

failure_count = 0
circuit_open = False
last_failure_time = 0

def call_external_service():
global failure_count, circuit_open, last_failure_time

if circuit_open:
# Check if we should try again
if time.time() - last_failure_time > 60:
circuit_open = False # Try again
failure_count = 0
else:
raise Exception('Circuit breaker open')

try:
response = requests.get('https://api.external.com/data')
failure_count = 0 # Reset on success
return response.json()
except Exception as e:
failure_count += 1
last_failure_time = time.time()

if failure_count > 5:
circuit_open = True # Stop trying

raise e

Caching for Burst Mitigation

Reduce load with aggressive caching:

import time

cache = {}
CACHE_TTL = 60 # 1 minute


def get_expensive_data(key):
cached = cache.get(key)
if cached and (time.time() - cached["time"]) < CACHE_TTL:
return cached["value"]

value = expensive_query()
cache[key] = {"value": value, "time": time.time()}
return value

During burst, cache hits avoid database queries entirely.

Scheduled Scaling

Pre-scale before expected spike:

ProvisionedConcurrency:
Type: AWS::CloudFormation::Stack
Properties:
TemplateURL: provisioned-concurrency.yaml
TimeZone: UTC
# Increase concurrency at 11 PM EST
# (before Black Friday midnight sale)

Use EventBridge to schedule increase/decrease:

# Scale up at 10 PM
aws events put-rule \
--name scale-up \
--schedule-expression "cron(22 0 ? * FRI *)"

# Invoke Lambda to increase provisioned concurrency

Load Testing

Simulate bursts before launch:

# Simulate 10,000 req/sec
artillery quick --count 10000 --num 100000 \
https://api.example.com

# Monitor:
# - Response times
# - Error rates
# - DynamoDB throttling
# - Lambda throttling

Fix bottlenecks before real traffic arrives.

Monitoring Bursts

Alert when burst happens:

BurstDetectionAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
MetricName: Invocations
Statistic: Sum
Period: 60
EvaluationPeriods: 1
Threshold: 5000 # 5000/min = 83/sec
ComparisonOperator: GreaterThanThreshold

Real Example: Black Friday

Preparation:

  • Load test with 10x expected traffic
  • Enable on-demand DynamoDB
  • Increase Lambda provisioned concurrency 10x
  • Setup queue-based processing
  • Enable caching
  • Setup alarms

Result:

  • Orders per hour: 1,000,000 (vs. 100,000 normal)
  • Errors: < 0.01%
  • Cost: 5x (acceptable)
  • ROI: Massive sales spike = 50x revenue increase

Best Practices

  1. Use on-demand for databases — Auto-scales with bursts
  2. Queue requests — Smooth out spikes
  3. Cache aggressively — Reduce load during burst
  4. Load test — Know your limits before users do
  5. Monitor strictly — Alert on anomalies
  6. Graceful degradation — Better degraded than down

Hands-On: Burst Handling

  1. Create API with mock 1-second response time
  2. Load test with normal traffic (100 req/sec)
  3. Load test with burst (5,000 req/sec)
  4. Add queue-based processing
  5. Re-test burst, see improvement

Key Takeaway

Bursts are inevitable. Prepare with queuing, caching, and on-demand scaling. You won't be caught off guard.