Performance Tuning

Profiling

Find slow code with distributed tracing.

Using X-Ray

Enable X-Ray (see Level 3, Lesson 5) and find:

Which services are slowest?
Which SQL queries are slowest?
Where are bottlenecks?

Using CloudWatch Logs Insights

fields @duration, @initDuration
| stats avg(@duration), max(@duration) by bin(1m)

Identify slow time windows.

Simple Explanation

What it is

Performance tuning is the practice of making serverless functions faster and more efficient.

Why we need it

At scale, small inefficiencies become big costs. Faster code is cheaper and gives users a better experience.

Benefits

Lower latency for users.
Lower compute cost per request.
More headroom during spikes.

Tradeoffs

Requires measurement before changes make sense.
Over-optimization can hurt clarity and maintainability.

Real-world examples (architecture only)

Slow response -> Parallelize calls -> Faster output.
High cost -> Reduce payload size -> Lower duration.

Code Level Optimization

1. Reduce Serialization

Parsing JSON is expensive:

import json

# ❌ Parse multiple times
data1 = json.loads(event.get("body") or "{}")
data2 = json.loads(event.get("body") or "{}")

# ✅ Parse once
data = json.loads(event.get("body") or "{}")
data1 = data
data2 = data

2. Parallel Queries

# ❌ Sequential: 100ms + 100ms + 100ms = 300ms
user_id = event.get("userId")
user = get_user(user_id)
orders = get_orders(user_id)
preferences = get_preferences(user_id)

# ✅ Parallel: max(100ms, 100ms, 100ms) = 100ms
import concurrent.futures

with concurrent.futures.ThreadPoolExecutor() as executor:
  user, orders, preferences = executor.map(
    lambda fn: fn(user_id),
    [get_user, get_orders, get_preferences],
  )

3. Batch Operations

# ❌ 1,000 DynamoDB calls = 1,000ms latency
user_ids = event.get("userIds", [])
for user_id in user_ids:
  ddb.get_item(TableName="Users", Key={"id": user_id})

# ✅ 1 batch call = 50ms latency
ddb.batch_get_item(
  RequestItems={
    "Users": {
      "Keys": [{"id": user_id} for user_id in user_ids]
    }
  }
)

4. Cache Aggressively

cache = {}


def get_user(user_id):
  if user_id in cache:
    return cache[user_id]

  user = ddb.get_item(TableName="Users", Key={"id": user_id})
  cache[user_id] = user
  return user

Database Optimization

DynamoDB

1. Use appropriate indexes:

ItemsTable:
  GlobalSecondaryIndexes:
    - IndexName: UserIdIndex
      Keys:
        PartitionKey: userId
        SortKey: createdAt
      Projection:
        ProjectionType: ALL

Query userId quickly.

2. Projection Expression:

# ❌ Fetch all attributes
ddb.query(TableName="Items", **params)

# ✅ Fetch only needed attributes
ddb.query(
  TableName="Items",
  ProjectionExpression="id, name, price",
  **params,
)

RDS

1. Query optimization:

-- ❌ Slow: Full table scan
SELECT * FROM users WHERE name = 'John';

--✅ Fast: Use index
CREATE INDEX idx_name ON users(name);
SELECT * FROM users WHERE name = 'John';

2. Connection pooling:

Use RDS Proxy (covered in Lesson 1).

Lambda Optimization

1. Ephemeral Storage

Increase /tmp storage if processing files:

MyFunction:
  Type: AWS::Serverless::Function
  Properties:
    EphemeralStorage:
      Size: 10240  # 10 GB (default 512 MB)

Process large files faster.

2. Memory & CPU

More memory = more CPU (free):

MyFunction:
  Type: AWS::Serverless::Function
  Properties:
    MemorySize: 3008  # Max memory
    # Gets 2 full vCPU (huge speedup)

More expensive per-ms, but faster = cheaper overall.

3. Architecture

Use appropriate architecture:

# Check supported architectures
aws lambda get-account-settings

# Use ARM if possible (Graviton): cheaper
aws lambda update-function-configuration \
  --function-name myfunction \
  --architectures arm64

Graviton 2 processors are 20% faster and 20% cheaper.

API Gateway Optimization

1. Caching

Cache GET requests by path parameter. For example, to cache requests to the path /items endpoint for 5 minutes:

MyApi:
  Type: AWS::Serverless::Api
  Properties:
    MethodSettings:
      - ResourcePath: '/items'
        HttpMethod: GET
        CachingEnabled: true
        CacheTtlInSeconds: 300

2. Compression

Enable compression for large responses:

MethodSettings:
  - ContentHandlingStrategy: CONVERT_TO_TEXT
    MinimumCompressionSize: 1024

Gzip responses > 1KB.

Network Optimization

1. VPC Endpoint

Lambda → RDS in VPC:

HelloWorld:
  Properties:
    VpcConfig:
      SecurityGroupIds:
        - sg-12345
      SubnetIds:
        - subnet-12345

Adds 5-10s cold start (avoid unless necessary).

2. CloudFront

Cache API responses globally:

CloudFrontDistribution:
  Type: AWS::CloudFront::Distribution
  Properties:
    DistributionConfig:
      CacheBehaviors:
        - PathPattern: '/api/*'
          TargetOriginId: myapi
          ViewerProtocolPolicy: https-only
          CachePolicyId: 'CachingOptimized'

Serve cached responses from edge locations, massive latency improvement.

Benchmarking

Test after each optimization:

# Time function execution
time sam local invoke MyFunction -e event.json

# Load test with increasing concurrency
for c in 1 10 100 1000; do
    ab -n 10000 -c $c https://api.example.com/
done

# Compare before/after times

Track improvements over time.

Monitoring Performance

P99 Latency

Show to executives, not P50:

P99DurationAlarm:
  Type: AWS::CloudWatch::Alarm
  Properties:
    MetricName: Duration
    Statistic: p99
    Threshold: 500  # Alert if P99 > 500ms

SLOs (Service Level Objectives)

Commit to users:

95% of requests < 200ms
99.9% uptime

Monitor against SLOs continuously.

Real-World Example

Optimization Journey:

Before:

Avg latency: 800ms
P99: 2000ms
Errors: 0.5%

Optimization 1: Add caching

Avg latency: 600ms
P99: 1500ms

Optimization 2: Parallel queries

Avg latency: 300ms
P99: 800ms

Optimization 3: Increase memory

Avg latency: 150ms
P99: 400ms

Result: 5.3x faster, still profitable!

Best Practices

Measure first — Use profiling, not guesswork
Parallelize — Concurrent operations are free
Cache aggressively — Every ms matters
Right-size memory — More memory = more CPU = faster
Use CloudFront — Massive latency improvement for global apps

Hands-On: Profile & Optimize

Deploy API endpoint
Measure latency (X-Ray or CloudWatch Logs Insights)
Identify bottleneck
Apply optimization
Remeasure latency
Calculate improvement

Key Takeaway

Performance is a feature. Every optimization compounds. Small improvements add up to transformative speedups.

Profiling​

Using X-Ray​

Using CloudWatch Logs Insights​

Simple Explanation​

What it is​

Why we need it​

Benefits​

Tradeoffs​

Real-world examples (architecture only)​

Code Level Optimization​

1. Reduce Serialization​

2. Parallel Queries​

3. Batch Operations​

4. Cache Aggressively​

Database Optimization​

DynamoDB​

RDS​

Lambda Optimization​

1. Ephemeral Storage​

2. Memory & CPU​

3. Architecture​

API Gateway Optimization​

1. Caching​

2. Compression​

Network Optimization​

1. VPC Endpoint​

2. CloudFront​

Benchmarking​

Monitoring Performance​

P99 Latency​

SLOs (Service Level Objectives)​

Real-World Example​

Best Practices​

Hands-On: Profile & Optimize​

Key Takeaway​