Optimizing AWS Lambda Performance: Cold Starts, Provisioned Concurrency, SnapStart, and Memory Tuning

“Lambda is slow” is almost never true. What is true is that an under-tuned function pays for a cold start it could have priced away, runs on a fraction of a vCPU because someone set 128 MB and forgot, and opens a fresh database connection on every invocation because the handler does its work in the wrong scope. Latency on Lambda is a tuning problem, not a platform limit. This guide walks the levers I reach for in that order of leverage: understand the cold start, tune memory (which is also CPU), then decide whether provisioned concurrency or SnapStart is justified, fix connection reuse, and plan concurrency so a load spike does not turn into a wall of throttles.

1. Anatomy of a cold start

A cold start is the work AWS does before your handler runs for the first time on a fresh execution environment. It has three measurable parts:

Download and init the environment — provision the microVM, pull your deployment package or container image, start the runtime.
Init phase (your code) — everything outside the handler: imports, SDK clients, static config, connection setup. This runs once per environment and is billed.
Invoke (warm path) — your handler body. After the first invocation the environment is reused, so subsequent calls skip the first two parts until the environment is recycled.

The init phase is where you have the most control. Two things dominate it: package size and what your code does at import time. A 250 MB unzipped bundle that eagerly constructs a dozen SDK clients and reads SSM parameters synchronously will have an init phase measured in seconds. Trim both.

# What is actually in the bundle? Init time tracks closely with this.
unzip -l function.zip | tail -1
# For Node, prune dev deps and bundle/tree-shake so only used code ships
npm prune --omit=dev
npx esbuild src/handler.js --bundle --minify --platform=node \
  --target=node20 --external:@aws-sdk/* --outfile=dist/handler.js

The AWS SDK v3 (@aws-sdk/*) and boto3 are already present in the managed runtimes. Marking the SDK --external and not bundling it keeps your artifact small. Pin to a known version with a layer only if you need behavior the runtime’s bundled SDK lacks.

You can read the init duration directly from the REPORT line in CloudWatch Logs — Init Duration only appears on cold-start invocations, which makes it a clean signal to filter on.

2. Memory is CPU: right-size with Lambda Power Tuning

This is the single highest-leverage knob and the most misunderstood. Lambda allocates CPU proportionally to memory. At 1,769 MB a function gets the equivalent of one full vCPU; below that you get a fraction, above it you get more than one. A CPU-bound function at 128 MB is not “cheap” — it runs ~14x slower than at 1,769 MB, and because Lambda bills GB-seconds, the slower run can cost the same or more while delivering far worse latency.

Do not guess. Run AWS Lambda Power Tuning, an open-source Step Functions state machine that invokes your function across a memory sweep and plots cost against speed.

# Deploy the tuner from the Serverless Application Repository
sam deploy \
  --template-file template.yaml \
  --stack-name lambda-power-tuning \
  --capabilities CAPABILITY_IAM \
  --parameter-overrides "PowerValues=128,256,512,1024,1536,1769,3008"

{
  "lambdaARN": "arn:aws:lambda:us-east-1:111122223333:function:order-processor",
  "powerValues": [128, 256, 512, 1024, 1536, 1769, 3008],
  "num": 50,
  "payload": { "orderId": "test-123" },
  "strategy": "balanced"
}

The balanced strategy returns the memory setting at the best cost-vs-speed tradeoff; use speed for latency-critical paths and cost for batch work. I have repeatedly found that moving a JSON-crunching function from 512 MB to 1024 MB halves duration and lowers cost because the work finishes in less than half the GB-seconds. Always tune with a representative payload — synthetic empty events lie.

3. Provisioned concurrency: pre-warmed capacity

If your tuned function still cannot tolerate cold starts on the critical path (a synchronous API behind API Gateway, a checkout flow), provisioned concurrency keeps a pool of environments initialized and ready, so the init phase has already happened before traffic arrives. It is configured against a version or alias — never $LATEST — which forces a clean deploy-then-shift model.

# Publish an immutable version, then point provisioned concurrency at the alias
aws lambda publish-version --function-name order-processor

aws lambda update-alias \
  --function-name order-processor \
  --name live \
  --function-version 42

aws lambda put-provisioned-concurrency-config \
  --function-name order-processor \
  --qualifier live \
  --provisioned-concurrent-executions 20

Static provisioning wastes money outside peak. Drive it with Application Auto Scaling on a schedule or a utilization target so you pay for warmth only when you need it:

aws application-autoscaling register-scalable-target \
  --service-namespace lambda \
  --resource-id function:order-processor:live \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --min-capacity 5 --max-capacity 100

aws application-autoscaling put-scaling-policy \
  --service-namespace lambda \
  --resource-id function:order-processor:live \
  --scalable-dimension lambda:function:ProvisionedConcurrency \
  --policy-name pc-utilization \
  --policy-type TargetTrackingScaling \
  --target-tracking-scaling-policy-configuration '{
    "TargetValue": 0.7,
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "LambdaProvisionedConcurrencyUtilization"
    }
  }'

Key facts to internalize: you pay for provisioned concurrency for the time it is enabled, whether or not it is invoked, plus a (reduced) per-request and duration charge when it is used. If demand exceeds your provisioned pool, the overflow spills to standard on-demand concurrency and those requests do cold-start. Watch the ProvisionedConcurrencySpilloverInvocations metric — sustained spillover means raise the floor.

4. SnapStart: snapshot-restore instead of re-init

SnapStart attacks cold starts from a different angle. Instead of keeping environments warm (and paying for idle capacity), Lambda runs your init once at publish time, takes a Firecracker microVM snapshot of the initialized memory and disk, and restores from that snapshot on cold start instead of re-running init. It is free of provisioned-concurrency idle cost. SnapStart supports Java, and AWS has extended it to Python and .NET runtimes; confirm the runtimes in your account’s region before committing.

# AWS SAM — enable SnapStart on a Java function
OrderProcessor:
  Type: AWS::Serverless::Function
  Properties:
    Runtime: java21
    Handler: com.example.Handler::handleRequest
    MemorySize: 1024
    SnapStart:
      ApplyOn: PublishedVersions
    AutoPublishAlias: live

The caveats are real and you must design for them:

Uniqueness. Anything generated once during init and captured in the snapshot — a random seed, a UUID, a cached timestamp — is now identical across every restored environment. Re-seed SecureRandom and regenerate per-invocation values after restore, not at class load. The AWS Cryptography libraries handle this for you; hand-rolled randomness does not.
Stale state. Network connections, credentials, and ephemeral tokens captured in the snapshot may be dead or expired on restore. Re-establish them in a runtime hook.
Priming. Restore is fast, but the JVM may still JIT-compile and lazy-load on the first real request. Use the beforeCheckpoint hook to prime hot paths (dummy invocations of your serialization, an SDK call) so that work is captured in the snapshot.

import org.crac.Core;
import org.crac.Resource;

public class Handler implements Resource {
  public Handler() {
    Core.getGlobalContext().register(this);
  }

  @Override
  public void beforeCheckpoint(org.crac.Context<? extends Resource> c) {
    // Prime: exercise hot paths so JIT/class-load is captured in the snapshot
    warmSerializers();
    warmSdkClients();
  }

  @Override
  public void afterRestore(org.crac.Context<? extends Resource> c) {
    // Re-establish anything that must be fresh per environment
    reSeedSecureRandom();
    refreshDbCredentials();
  }
}

SnapStart vs provisioned concurrency is a real decision: SnapStart removes most of the init cold start with no idle charge but does nothing for sub-millisecond consistency and adds restore + priming complexity; provisioned concurrency gives the flattest tail latency but you pay for warm capacity continuously. Many teams run SnapStart by default and reserve provisioned concurrency for the few endpoints with the strictest p99.

5. Connection management and reuse across invocations

The most common self-inflicted latency bug: opening a database connection, HTTP client, or secret fetch inside the handler. That work then runs on every warm invocation. Move it to module/static scope so it is created once during init and reused across invocations on the same environment.

import os
import boto3
import psycopg2

# INIT SCOPE: runs once per environment, reused by every warm invocation
_secrets = boto3.client("secretsmanager")
_conn = None

def _get_conn():
    global _conn
    if _conn is None or _conn.closed:
        _conn = psycopg2.connect(host=os.environ["DB_HOST"], connect_timeout=3)
    return _conn

def handler(event, context):
    cur = _get_conn().cursor()           # reuse the connection
    cur.execute("SELECT 1")
    return {"ok": cur.fetchone()[0]}

For Node, set AWS_NODEJS_CONNECTION_REUSE_ENABLED=1 so the SDK reuses keep-alive TCP connections (this is the default in SDK v3 but harmless to set explicitly). The deeper problem at scale is connection-count blowup: 500 concurrent Lambda environments each holding a Postgres connection will exhaust max_connections on a db.r6g.large. Amazon RDS Proxy solves this by pooling and multiplexing connections on Lambda’s behalf, and it lets functions fetch DB credentials via IAM instead of embedding secrets.

aws rds create-db-proxy \
  --db-proxy-name app-proxy \
  --engine-family POSTGRESQL \
  --auth '[{"AuthScheme":"SECRETS","SecretArn":"arn:aws:secretsmanager:us-east-1:111122223333:secret:db-creds","IAMAuth":"REQUIRED"}]' \
  --role-arn arn:aws:iam::111122223333:role/rds-proxy-role \
  --vpc-subnet-ids subnet-0a1b2c subnet-0d4e5f

Point the function’s DB_HOST at the proxy endpoint, attach the function to the same VPC subnets, and let the proxy absorb the connection churn. This is non-negotiable above a few hundred concurrent executions against a relational database.

6. Concurrency controls: reserved, throttles, and quota planning

Concurrency is the number of in-flight executions. Your account has a regional concurrency limit (1,000 by default, raisable via a quota request). Two controls shape how it is shared:

Reserved concurrency caps a function at a maximum and guarantees that floor for it, carving it out of the shared pool. Use it to (a) protect a downstream like a database from being overwhelmed and (b) stop one noisy function from starving the rest of the account.
Provisioned concurrency (above) is a subset of reserved that is also pre-warmed.

# Cap order-processor at 200 concurrent executions
aws lambda put-function-concurrency \
  --function-name order-processor \
  --reserved-concurrent-executions 200

When a function hits its reserved limit (or the account hits the regional limit), Lambda throttles — synchronous callers get a 429 TooManyRequestsException; asynchronous and event-source invocations retry with backoff. Plan for it: set reserved concurrency on the function fronting your most fragile dependency, alarm on the Throttles metric, and request a regional quota increase before a launch, not during the incident. New accounts also have a lower burst concurrency ceiling that governs how fast you can scale from cold — factor that into spike planning.

7. Observability: see the cold starts you are paying for

You cannot tune what you cannot measure. Three layers:

CloudWatch Logs Insights — quantify cold-start frequency and init cost straight from the REPORT lines:

filter @type = "REPORT"
| fields @initDuration, @duration, @billedDuration, @maxMemoryUsed / 1000000 as memUsedMB
| stats count(*) as invocations,
        count(@initDuration) as coldStarts,
        avg(@initDuration) as avgInitMs,
        pct(@duration, 99) as p99DurationMs,
        max(memUsedMB) as peakMemMB

If peakMemMB sits far below your configured memory, you over-allocated; if coldStarts / invocations is high on a latency-sensitive function, that is your provisioned-concurrency / SnapStart signal.

Lambda Insights — a managed CloudWatch layer that surfaces CPU, memory, network, and init metrics per function with one config flag. With SAM:

OrderProcessor:
  Type: AWS::Serverless::Function
  Properties:
    Policies:
      - CloudWatchLambdaInsightsExecutionRolePolicy
    Layers:
      - !Sub "arn:aws:lambda:${AWS::Region}:580247275435:layer:LambdaInsightsExtension:53"

AWS X-Ray — turn on active tracing to break a request into segments. The init subsegment shows cold-start cost, and downstream segments (DynamoDB, RDS, an HTTP call) reveal whether your latency is actually in your code or in a dependency you mistuned.

aws lambda update-function-configuration \
  --function-name order-processor \
  --tracing-config Mode=Active

8. Cost vs latency: a decision framework

There is no universal “fastest” setting — there is the cheapest setting that meets your latency SLO. Walk it in this order:

Symptom	First lever	Then consider
Function feels slow, no SLO pressure	Power Tuning (right-size memory)	Trim package / init code
High p99 on a synchronous API	Power Tuning, then provisioned concurrency on the alias	SnapStart if JVM/Python
JVM cold starts dominate, cost-sensitive	SnapStart with priming hooks	Provisioned concurrency for the few strict-p99 paths
DB connection errors at scale	Init-scope reuse + RDS Proxy	Reserved concurrency cap on the DB-facing function
Throttles under spike	Request regional quota increase	Reserved concurrency to protect/partition

The guiding principle: tune memory before you buy warmth. Right-sizing is free and often cuts both latency and cost; provisioned concurrency and SnapStart are how you buy down the cold start that remains, and they trade money or complexity for tail latency. Spend that money only on the paths whose SLO actually requires it.

Enterprise scenario

A payments platform team ran a synchronous “authorize transaction” Lambda (Java 17, Spring) behind API Gateway. p50 was a healthy 40 ms, but p99 spiked to 6+ seconds whenever traffic stepped up — classic JVM cold starts as new environments spun to meet demand. The constraint: a hard contractual p99 < 800 ms with the card network, and a finance mandate to cut Lambda spend that had blown up after they “fixed” an earlier latency issue by setting provisioned concurrency to a flat 300 around the clock — paying for 300 warm JVMs at 3 AM for a daytime workload.

They reworked it in three moves. First, Power Tuning showed the function was CPU-bound; moving from 1024 MB to 1769 MB cut warm duration by ~45% at roughly neutral cost (less GB-seconds per call). Second, they enabled SnapStart with a beforeCheckpoint hook that primed the Spring context, Jackson serializers, and the SDK clients — this removed the multi-second class-load/JIT penalty from cold starts entirely, at zero idle cost. Third, they replaced the flat 300 provisioned concurrency with Application Auto Scaling: a small floor (10) for always-on readiness plus a schedule that ramped to 150 during business hours, scaling down overnight.

# The combination that hit the SLO: SnapStart for the floor, scheduled PC for the peak
AuthorizeTxn:
  Type: AWS::Serverless::Function
  Properties:
    Runtime: java17
    MemorySize: 1769
    SnapStart:
      ApplyOn: PublishedVersions
    AutoPublishAlias: live
    ProvisionedConcurrencyConfig:
      ProvisionedConcurrentExecutions: 10   # baseline; Application Auto Scaling ramps to 150 on schedule

Result: p99 settled under 500 ms even during step-ups, and the provisioned-concurrency bill dropped roughly 70% versus the flat-300 configuration. The lesson the team wrote into their runbook: SnapStart removes the init tax for free; provisioned concurrency is for the peak tail you still cannot tolerate — and you scale it, you do not nail it to the floor.

Verify

Confirm each lever actually took effect before you call it done:

# Memory setting applied
aws lambda get-function-configuration \
  --function-name order-processor --query 'MemorySize'

# Provisioned concurrency is READY (not just requested)
aws lambda get-provisioned-concurrency-config \
  --function-name order-processor --qualifier live \
  --query '{status:Status, allocated:AllocatedProvisionedConcurrentExecutions}'

# SnapStart is on and the version's optimization completed
aws lambda get-function-configuration \
  --function-name order-processor --qualifier live \
  --query 'SnapStart'

# Reserved concurrency cap is set
aws lambda get-function-concurrency \
  --function-name order-processor

Then watch the numbers move: run the CloudWatch Logs Insights query from Section 7 over a window after the change and confirm avgInitMs dropped (or coldStarts fell toward zero on the provisioned alias), p99DurationMs is inside your SLO, and peakMemMB justifies the memory you are paying for. Check the ProvisionedConcurrencySpilloverInvocations and Throttles metrics are flat under representative load.

Checklist

Trimmed the deployment package; SDK marked external/not bundled, dev deps pruned.
Heavy init work (clients, connections, secrets) moved to module/static scope, not the handler.
Ran Lambda Power Tuning with a representative payload and applied the recommended memory.
Provisioned concurrency (if used) is on an alias, never $LATEST, and driven by Application Auto Scaling rather than a flat 24x7 number.
SnapStart enabled on supported runtimes, with afterRestore re-seeding randomness/credentials and beforeCheckpoint priming hot paths.
Relational DB access goes through RDS Proxy above a few hundred concurrent executions; connections reused across invocations.
Reserved concurrency set on functions fronting fragile downstreams; regional quota increase requested ahead of launch.
Active tracing (X-Ray) and Lambda Insights enabled; alarms on Throttles and ProvisionedConcurrencySpilloverInvocations.
Verified Init Duration, p99, and peak memory moved in the right direction after the change, not just assumed.

Optimizing AWS Lambda Performance: Cold Starts, Provisioned Concurrency, SnapStart, and Memory Tuning

1. Anatomy of a cold start

2. Memory is CPU: right-size with Lambda Power Tuning

3. Provisioned concurrency: pre-warmed capacity

4. SnapStart: snapshot-restore instead of re-init

5. Connection management and reuse across invocations

6. Concurrency controls: reserved, throttles, and quota planning

7. Observability: see the cold starts you are paying for

8. Cost vs latency: a decision framework

Enterprise scenario

Verify

Checklist

Written by Vinod

Comments

Keep Reading

Centralized AWS Backup with Organizations: Vault Lock, Cross-Account Copy, and Recovery Runbooks

Centralized Egress Inspection with AWS Network Firewall: Routing, Domain Filtering, and Suricata Rules

Validating VPC Connectivity with Reachability Analyzer and Network Access Analyzer