Performance Tuning
Profiling
Find slow code with distributed tracing.
Using X-Ray
Enable X-Ray (see Level 3, Lesson 5) and find:
- Which services are slowest?
- Which SQL queries are slowest?
- Where are bottlenecks?
Using CloudWatch Logs Insights
fields @duration, @initDuration
| stats avg(@duration), max(@duration) by bin(1m)
Identify slow time windows.
Simple Explanation
What it is
Performance tuning is the practice of making serverless functions faster and more efficient.
Why we need it
At scale, small inefficiencies become big costs. Faster code is cheaper and gives users a better experience.
Benefits
- Lower latency for users.
- Lower compute cost per request.
- More headroom during spikes.
Tradeoffs
- Requires measurement before changes make sense.
- Over-optimization can hurt clarity and maintainability.
Real-world examples (architecture only)
- Slow response -> Parallelize calls -> Faster output.
- High cost -> Reduce payload size -> Lower duration.
Code Level Optimization
1. Reduce Serialization
Parsing JSON is expensive:
import json
# ❌ Parse multiple times
data1 = json.loads(event.get("body") or "{}")
data2 = json.loads(event.get("body") or "{}")
# ✅ Parse once
data = json.loads(event.get("body") or "{}")
data1 = data
data2 = data
2. Parallel Queries
# ❌ Sequential: 100ms + 100ms + 100ms = 300ms
user_id = event.get("userId")
user = get_user(user_id)
orders = get_orders(user_id)
preferences = get_preferences(user_id)
# ✅ Parallel: max(100ms, 100ms, 100ms) = 100ms
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor() as executor:
user, orders, preferences = executor.map(
lambda fn: fn(user_id),
[get_user, get_orders, get_preferences],
)
3. Batch Operations
# ❌ 1,000 DynamoDB calls = 1,000ms latency
user_ids = event.get("userIds", [])
for user_id in user_ids:
ddb.get_item(TableName="Users", Key={"id": user_id})
# ✅ 1 batch call = 50ms latency
ddb.batch_get_item(
RequestItems={
"Users": {
"Keys": [{"id": user_id} for user_id in user_ids]
}
}
)
4. Cache Aggressively
cache = {}
def get_user(user_id):
if user_id in cache:
return cache[user_id]
user = ddb.get_item(TableName="Users", Key={"id": user_id})
cache[user_id] = user
return user
Database Optimization
DynamoDB
1. Use appropriate indexes:
ItemsTable:
GlobalSecondaryIndexes:
- IndexName: UserIdIndex
Keys:
PartitionKey: userId
SortKey: createdAt
Projection:
ProjectionType: ALL
Query userId quickly.
2. Projection Expression:
# ❌ Fetch all attributes
ddb.query(TableName="Items", **params)
# ✅ Fetch only needed attributes
ddb.query(
TableName="Items",
ProjectionExpression="id, name, price",
**params,
)
RDS
1. Query optimization:
-- ❌ Slow: Full table scan
SELECT * FROM users WHERE name = 'John';
--✅ Fast: Use index
CREATE INDEX idx_name ON users(name);
SELECT * FROM users WHERE name = 'John';
2. Connection pooling:
Use RDS Proxy (covered in Lesson 1).
Lambda Optimization
1. Ephemeral Storage
Increase /tmp storage if processing files:
MyFunction:
Type: AWS::Serverless::Function
Properties:
EphemeralStorage:
Size: 10240 # 10 GB (default 512 MB)
Process large files faster.
2. Memory & CPU
More memory = more CPU (free):
MyFunction:
Type: AWS::Serverless::Function
Properties:
MemorySize: 3008 # Max memory
# Gets 2 full vCPU (huge speedup)
More expensive per-ms, but faster = cheaper overall.
3. Architecture
Use appropriate architecture:
# Check supported architectures
aws lambda get-account-settings
# Use ARM if possible (Graviton): cheaper
aws lambda update-function-configuration \
--function-name myfunction \
--architectures arm64
Graviton 2 processors are 20% faster and 20% cheaper.
API Gateway Optimization
1. Caching
Cache GET requests by path parameter. For example, to cache requests to the path /items endpoint for 5 minutes:
MyApi:
Type: AWS::Serverless::Api
Properties:
MethodSettings:
- ResourcePath: '/items'
HttpMethod: GET
CachingEnabled: true
CacheTtlInSeconds: 300
2. Compression
Enable compression for large responses:
MethodSettings:
- ContentHandlingStrategy: CONVERT_TO_TEXT
MinimumCompressionSize: 1024
Gzip responses > 1KB.
Network Optimization
1. VPC Endpoint
Lambda → RDS in VPC:
HelloWorld:
Properties:
VpcConfig:
SecurityGroupIds:
- sg-12345
SubnetIds:
- subnet-12345
Adds 5-10s cold start (avoid unless necessary).
2. CloudFront
Cache API responses globally:
CloudFrontDistribution:
Type: AWS::CloudFront::Distribution
Properties:
DistributionConfig:
CacheBehaviors:
- PathPattern: '/api/*'
TargetOriginId: myapi
ViewerProtocolPolicy: https-only
CachePolicyId: 'CachingOptimized'
Serve cached responses from edge locations, massive latency improvement.
Benchmarking
Test after each optimization:
# Time function execution
time sam local invoke MyFunction -e event.json
# Load test with increasing concurrency
for c in 1 10 100 1000; do
ab -n 10000 -c $c https://api.example.com/
done
# Compare before/after times
Track improvements over time.
Monitoring Performance
P99 Latency
Show to executives, not P50:
P99DurationAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
MetricName: Duration
Statistic: p99
Threshold: 500 # Alert if P99 > 500ms
SLOs (Service Level Objectives)
Commit to users:
- 95% of requests < 200ms
- 99.9% uptime
Monitor against SLOs continuously.
Real-World Example
Optimization Journey:
Before:
- Avg latency: 800ms
- P99: 2000ms
- Errors: 0.5%
Optimization 1: Add caching
- Avg latency: 600ms
- P99: 1500ms
Optimization 2: Parallel queries
- Avg latency: 300ms
- P99: 800ms
Optimization 3: Increase memory
- Avg latency: 150ms
- P99: 400ms
Result: 5.3x faster, still profitable!
Best Practices
- Measure first — Use profiling, not guesswork
- Parallelize — Concurrent operations are free
- Cache aggressively — Every ms matters
- Right-size memory — More memory = more CPU = faster
- Use CloudFront — Massive latency improvement for global apps
Hands-On: Profile & Optimize
- Deploy API endpoint
- Measure latency (X-Ray or CloudWatch Logs Insights)
- Identify bottleneck
- Apply optimization
- Remeasure latency
- Calculate improvement
Key Takeaway
Performance is a feature. Every optimization compounds. Small improvements add up to transformative speedups.