“Lambda is slow” is almost never true. What is true is that an under-tuned function pays for a cold start it could have priced away, runs on a fraction of a vCPU because someone set 128 MB and forgot, and opens a fresh database connection on every invocation because the handler does its work in the wrong scope. Latency on Lambda is a tuning problem, not a platform limit. This guide walks the levers I reach for in that order of leverage: understand the cold start, tune memory (which is also CPU), then decide whether provisioned concurrency or SnapStart is justified, fix connection reuse, and plan concurrency so a load spike does not turn into a wall of throttles.
1. Anatomy of a cold start
A cold start is the work AWS does before your handler runs for the first time on a fresh execution environment. It has three measurable parts:
- Download and init the environment — provision the microVM, pull your deployment package or container image, start the runtime.
- Init phase (your code) — everything outside the handler: imports, SDK clients, static config, connection setup. This runs once per environment and is billed.
- Invoke (warm path) — your handler body. After the first invocation the environment is reused, so subsequent calls skip the first two parts until the environment is recycled.
The init phase is where you have the most control. Two things dominate it: package size and what your code does at import time. A 250 MB unzipped bundle that eagerly constructs a dozen SDK clients and reads SSM parameters synchronously will have an init phase measured in seconds. Trim both.
# What is actually in the bundle? Init time tracks closely with this.
unzip -l function.zip | tail -1
# For Node, prune dev deps and bundle/tree-shake so only used code ships
npm prune --omit=dev
npx esbuild src/handler.js --bundle --minify --platform=node \
--target=node20 --external:@aws-sdk/* --outfile=dist/handler.js
The AWS SDK v3 (
@aws-sdk/*) and boto3 are already present in the managed runtimes. Marking the SDK--externaland not bundling it keeps your artifact small. Pin to a known version with a layer only if you need behavior the runtime’s bundled SDK lacks.
You can read the init duration directly from the REPORT line in CloudWatch Logs — Init Duration only appears on cold-start invocations, which makes it a clean signal to filter on.
2. Memory is CPU: right-size with Lambda Power Tuning
This is the single highest-leverage knob and the most misunderstood. Lambda allocates CPU proportionally to memory. At 1,769 MB a function gets the equivalent of one full vCPU; below that you get a fraction, above it you get more than one. A CPU-bound function at 128 MB is not “cheap” — it runs ~14x slower than at 1,769 MB, and because Lambda bills GB-seconds, the slower run can cost the same or more while delivering far worse latency.
Do not guess. Run AWS Lambda Power Tuning, an open-source Step Functions state machine that invokes your function across a memory sweep and plots cost against speed.
# Deploy the tuner from the Serverless Application Repository
sam deploy \
--template-file template.yaml \
--stack-name lambda-power-tuning \
--capabilities CAPABILITY_IAM \
--parameter-overrides "PowerValues=128,256,512,1024,1536,1769,3008"
{
"lambdaARN": "arn:aws:lambda:us-east-1:111122223333:function:order-processor",
"powerValues": [128, 256, 512, 1024, 1536, 1769, 3008],
"num": 50,
"payload": { "orderId": "test-123" },
"strategy": "balanced"
}
The balanced strategy returns the memory setting at the best cost-vs-speed tradeoff; use speed for latency-critical paths and cost for batch work. I have repeatedly found that moving a JSON-crunching function from 512 MB to 1024 MB halves duration and lowers cost because the work finishes in less than half the GB-seconds. Always tune with a representative payload — synthetic empty events lie.
3. Provisioned concurrency: pre-warmed capacity
If your tuned function still cannot tolerate cold starts on the critical path (a synchronous API behind API Gateway, a checkout flow), provisioned concurrency keeps a pool of environments initialized and ready, so the init phase has already happened before traffic arrives. It is configured against a version or alias — never $LATEST — which forces a clean deploy-then-shift model.
# Publish an immutable version, then point provisioned concurrency at the alias
aws lambda publish-version --function-name order-processor
aws lambda update-alias \
--function-name order-processor \
--name live \
--function-version 42
aws lambda put-provisioned-concurrency-config \
--function-name order-processor \
--qualifier live \
--provisioned-concurrent-executions 20
Static provisioning wastes money outside peak. Drive it with Application Auto Scaling on a schedule or a utilization target so you pay for warmth only when you need it:
aws application-autoscaling register-scalable-target \
--service-namespace lambda \
--resource-id function:order-processor:live \
--scalable-dimension lambda:function:ProvisionedConcurrency \
--min-capacity 5 --max-capacity 100
aws application-autoscaling put-scaling-policy \
--service-namespace lambda \
--resource-id function:order-processor:live \
--scalable-dimension lambda:function:ProvisionedConcurrency \
--policy-name pc-utilization \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"TargetValue": 0.7,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "LambdaProvisionedConcurrencyUtilization"
}
}'
Key facts to internalize: you pay for provisioned concurrency for the time it is enabled, whether or not it is invoked, plus a (reduced) per-request and duration charge when it is used. If demand exceeds your provisioned pool, the overflow spills to standard on-demand concurrency and those requests do cold-start. Watch the ProvisionedConcurrencySpilloverInvocations metric — sustained spillover means raise the floor.
4. SnapStart: snapshot-restore instead of re-init
SnapStart attacks cold starts from a different angle. Instead of keeping environments warm (and paying for idle capacity), Lambda runs your init once at publish time, takes a Firecracker microVM snapshot of the initialized memory and disk, and restores from that snapshot on cold start instead of re-running init. It is free of provisioned-concurrency idle cost. SnapStart supports Java, and AWS has extended it to Python and .NET runtimes; confirm the runtimes in your account’s region before committing.
# AWS SAM — enable SnapStart on a Java function
OrderProcessor:
Type: AWS::Serverless::Function
Properties:
Runtime: java21
Handler: com.example.Handler::handleRequest
MemorySize: 1024
SnapStart:
ApplyOn: PublishedVersions
AutoPublishAlias: live
The caveats are real and you must design for them:
- Uniqueness. Anything generated once during init and captured in the snapshot — a random seed, a UUID, a cached timestamp — is now identical across every restored environment. Re-seed
SecureRandomand regenerate per-invocation values after restore, not at class load. The AWS Cryptography libraries handle this for you; hand-rolled randomness does not. - Stale state. Network connections, credentials, and ephemeral tokens captured in the snapshot may be dead or expired on restore. Re-establish them in a runtime hook.
- Priming. Restore is fast, but the JVM may still JIT-compile and lazy-load on the first real request. Use the
beforeCheckpointhook to prime hot paths (dummy invocations of your serialization, an SDK call) so that work is captured in the snapshot.
import org.crac.Core;
import org.crac.Resource;
public class Handler implements Resource {
public Handler() {
Core.getGlobalContext().register(this);
}
@Override
public void beforeCheckpoint(org.crac.Context<? extends Resource> c) {
// Prime: exercise hot paths so JIT/class-load is captured in the snapshot
warmSerializers();
warmSdkClients();
}
@Override
public void afterRestore(org.crac.Context<? extends Resource> c) {
// Re-establish anything that must be fresh per environment
reSeedSecureRandom();
refreshDbCredentials();
}
}
SnapStart vs provisioned concurrency is a real decision: SnapStart removes most of the init cold start with no idle charge but does nothing for sub-millisecond consistency and adds restore + priming complexity; provisioned concurrency gives the flattest tail latency but you pay for warm capacity continuously. Many teams run SnapStart by default and reserve provisioned concurrency for the few endpoints with the strictest p99.
5. Connection management and reuse across invocations
The most common self-inflicted latency bug: opening a database connection, HTTP client, or secret fetch inside the handler. That work then runs on every warm invocation. Move it to module/static scope so it is created once during init and reused across invocations on the same environment.
import os
import boto3
import psycopg2
# INIT SCOPE: runs once per environment, reused by every warm invocation
_secrets = boto3.client("secretsmanager")
_conn = None
def _get_conn():
global _conn
if _conn is None or _conn.closed:
_conn = psycopg2.connect(host=os.environ["DB_HOST"], connect_timeout=3)
return _conn
def handler(event, context):
cur = _get_conn().cursor() # reuse the connection
cur.execute("SELECT 1")
return {"ok": cur.fetchone()[0]}
For Node, set AWS_NODEJS_CONNECTION_REUSE_ENABLED=1 so the SDK reuses keep-alive TCP connections (this is the default in SDK v3 but harmless to set explicitly). The deeper problem at scale is connection-count blowup: 500 concurrent Lambda environments each holding a Postgres connection will exhaust max_connections on a db.r6g.large. Amazon RDS Proxy solves this by pooling and multiplexing connections on Lambda’s behalf, and it lets functions fetch DB credentials via IAM instead of embedding secrets.
aws rds create-db-proxy \
--db-proxy-name app-proxy \
--engine-family POSTGRESQL \
--auth '[{"AuthScheme":"SECRETS","SecretArn":"arn:aws:secretsmanager:us-east-1:111122223333:secret:db-creds","IAMAuth":"REQUIRED"}]' \
--role-arn arn:aws:iam::111122223333:role/rds-proxy-role \
--vpc-subnet-ids subnet-0a1b2c subnet-0d4e5f
Point the function’s DB_HOST at the proxy endpoint, attach the function to the same VPC subnets, and let the proxy absorb the connection churn. This is non-negotiable above a few hundred concurrent executions against a relational database.
6. Concurrency controls: reserved, throttles, and quota planning
Concurrency is the number of in-flight executions. Your account has a regional concurrency limit (1,000 by default, raisable via a quota request). Two controls shape how it is shared:
- Reserved concurrency caps a function at a maximum and guarantees that floor for it, carving it out of the shared pool. Use it to (a) protect a downstream like a database from being overwhelmed and (b) stop one noisy function from starving the rest of the account.
- Provisioned concurrency (above) is a subset of reserved that is also pre-warmed.
# Cap order-processor at 200 concurrent executions
aws lambda put-function-concurrency \
--function-name order-processor \
--reserved-concurrent-executions 200
When a function hits its reserved limit (or the account hits the regional limit), Lambda throttles — synchronous callers get a 429 TooManyRequestsException; asynchronous and event-source invocations retry with backoff. Plan for it: set reserved concurrency on the function fronting your most fragile dependency, alarm on the Throttles metric, and request a regional quota increase before a launch, not during the incident. New accounts also have a lower burst concurrency ceiling that governs how fast you can scale from cold — factor that into spike planning.
7. Observability: see the cold starts you are paying for
You cannot tune what you cannot measure. Three layers:
CloudWatch Logs Insights — quantify cold-start frequency and init cost straight from the REPORT lines:
filter @type = "REPORT"
| fields @initDuration, @duration, @billedDuration, @maxMemoryUsed / 1000000 as memUsedMB
| stats count(*) as invocations,
count(@initDuration) as coldStarts,
avg(@initDuration) as avgInitMs,
pct(@duration, 99) as p99DurationMs,
max(memUsedMB) as peakMemMB
If peakMemMB sits far below your configured memory, you over-allocated; if coldStarts / invocations is high on a latency-sensitive function, that is your provisioned-concurrency / SnapStart signal.
Lambda Insights — a managed CloudWatch layer that surfaces CPU, memory, network, and init metrics per function with one config flag. With SAM:
OrderProcessor:
Type: AWS::Serverless::Function
Properties:
Policies:
- CloudWatchLambdaInsightsExecutionRolePolicy
Layers:
- !Sub "arn:aws:lambda:${AWS::Region}:580247275435:layer:LambdaInsightsExtension:53"
AWS X-Ray — turn on active tracing to break a request into segments. The init subsegment shows cold-start cost, and downstream segments (DynamoDB, RDS, an HTTP call) reveal whether your latency is actually in your code or in a dependency you mistuned.
aws lambda update-function-configuration \
--function-name order-processor \
--tracing-config Mode=Active
8. Cost vs latency: a decision framework
There is no universal “fastest” setting — there is the cheapest setting that meets your latency SLO. Walk it in this order:
| Symptom | First lever | Then consider |
|---|---|---|
| Function feels slow, no SLO pressure | Power Tuning (right-size memory) | Trim package / init code |
| High p99 on a synchronous API | Power Tuning, then provisioned concurrency on the alias | SnapStart if JVM/Python |
| JVM cold starts dominate, cost-sensitive | SnapStart with priming hooks | Provisioned concurrency for the few strict-p99 paths |
| DB connection errors at scale | Init-scope reuse + RDS Proxy | Reserved concurrency cap on the DB-facing function |
| Throttles under spike | Request regional quota increase | Reserved concurrency to protect/partition |
The guiding principle: tune memory before you buy warmth. Right-sizing is free and often cuts both latency and cost; provisioned concurrency and SnapStart are how you buy down the cold start that remains, and they trade money or complexity for tail latency. Spend that money only on the paths whose SLO actually requires it.
Enterprise scenario
A payments platform team ran a synchronous “authorize transaction” Lambda (Java 17, Spring) behind API Gateway. p50 was a healthy 40 ms, but p99 spiked to 6+ seconds whenever traffic stepped up — classic JVM cold starts as new environments spun to meet demand. The constraint: a hard contractual p99 < 800 ms with the card network, and a finance mandate to cut Lambda spend that had blown up after they “fixed” an earlier latency issue by setting provisioned concurrency to a flat 300 around the clock — paying for 300 warm JVMs at 3 AM for a daytime workload.
They reworked it in three moves. First, Power Tuning showed the function was CPU-bound; moving from 1024 MB to 1769 MB cut warm duration by ~45% at roughly neutral cost (less GB-seconds per call). Second, they enabled SnapStart with a beforeCheckpoint hook that primed the Spring context, Jackson serializers, and the SDK clients — this removed the multi-second class-load/JIT penalty from cold starts entirely, at zero idle cost. Third, they replaced the flat 300 provisioned concurrency with Application Auto Scaling: a small floor (10) for always-on readiness plus a schedule that ramped to 150 during business hours, scaling down overnight.
# The combination that hit the SLO: SnapStart for the floor, scheduled PC for the peak
AuthorizeTxn:
Type: AWS::Serverless::Function
Properties:
Runtime: java17
MemorySize: 1769
SnapStart:
ApplyOn: PublishedVersions
AutoPublishAlias: live
ProvisionedConcurrencyConfig:
ProvisionedConcurrentExecutions: 10 # baseline; Application Auto Scaling ramps to 150 on schedule
Result: p99 settled under 500 ms even during step-ups, and the provisioned-concurrency bill dropped roughly 70% versus the flat-300 configuration. The lesson the team wrote into their runbook: SnapStart removes the init tax for free; provisioned concurrency is for the peak tail you still cannot tolerate — and you scale it, you do not nail it to the floor.
Verify
Confirm each lever actually took effect before you call it done:
# Memory setting applied
aws lambda get-function-configuration \
--function-name order-processor --query 'MemorySize'
# Provisioned concurrency is READY (not just requested)
aws lambda get-provisioned-concurrency-config \
--function-name order-processor --qualifier live \
--query '{status:Status, allocated:AllocatedProvisionedConcurrentExecutions}'
# SnapStart is on and the version's optimization completed
aws lambda get-function-configuration \
--function-name order-processor --qualifier live \
--query 'SnapStart'
# Reserved concurrency cap is set
aws lambda get-function-concurrency \
--function-name order-processor
Then watch the numbers move: run the CloudWatch Logs Insights query from Section 7 over a window after the change and confirm avgInitMs dropped (or coldStarts fell toward zero on the provisioned alias), p99DurationMs is inside your SLO, and peakMemMB justifies the memory you are paying for. Check the ProvisionedConcurrencySpilloverInvocations and Throttles metrics are flat under representative load.