Architecture AWS

AWS Well-Architected: Performance Efficiency — Architecture Selection (Compute, Storage, Database, Network), Performance Review, Monitoring, and Trade-offs

Where this fits

Performance Efficiency is the fourth of the six pillars in the AWS Well-Architected Framework (after Operational Excellence, Security, and Reliability, and before Cost Optimization and Sustainability). Its definition is deceptively simple — use computing resources efficiently to meet requirements, and maintain that efficiency as demand changes and technologies evolve — but it is the pillar where architectural laziness costs you the most, because the cloud removes the old excuses (you can no longer claim you were stuck with the hardware procurement gave you). Its five design principles are: democratize advanced technologies (consume them as managed services rather than building them), go global in minutes, use serverless architectures, experiment more often, and consider mechanical sympathy (choose the technology that best aligns to how your workload actually behaves). The Framework expresses its expectations as five numbered best-practice questions — PERF 1 (architecture selection), PERF 2 (compute), PERF 3 (data management/storage and database), PERF 4 (networking), and PERF 5 (process and culture: review, monitoring, and trade-offs). This article walks each sub-component as you would actually implement it, naming the concrete services, artifacts, benchmarks, and trade-offs.

AWS Well-Architected Framework — animated overview

Architecture selection — compute, storage, database, network (PERF 1–4)

What it is. Architecture selection is the discipline of choosing, for each component of a workload, the resource type and configuration that best fits the workload’s access pattern, data shape, and performance goals — and doing so with evidence rather than habit. In a Well-Architected sense this is PERF 1 (“How do you select the appropriate cloud resources and architecture for your workload?”) decomposed across the four dimensions the cloud gives you near-infinite choice in: compute, storage, database, and network. The principle that ties them together is mechanical sympathy — matching the technology to the physics and behaviour of the workload, not to what the team last used.

Why it matters. Every other pillar inherits these choices. A latency-sensitive API placed on a throughput-optimized instance family, a random-access dataset on a throughput-optimized HDD volume, a key-value workload bolted onto a relational engine with a JOIN it was never designed for — each is a structural performance ceiling no amount of scaling or caching fully papers over. AWS publishes hundreds of instance types, half a dozen EBS volume types, multiple S3 storage classes, and more than fifteen purpose-built database engines precisely because no single resource is right for every job. Selecting well is the single highest-leverage performance decision you make, and it is cheapest to make at design time.

Compute (PERF 2)

How to do it well. Decide first which compute paradigm the workload wants — instances, containers, or functions — then optimize within it.

Workload shape Well-suited compute Why
Spiky / event-driven / unpredictable Lambda (Graviton, SnapStart) No idle cost, instant scale, zero capacity planning
Steady microservices, need density ECS/EKS on EC2 + Karpenter Bin-packing, fast right-sized nodes, broad instance choice
Stateless containers, ops-light AWS Fargate No node management, per-second billing
CPU-bound batch / encoding C-family (Graviton c7g) + Spot Best compute price-performance, fault-tolerant
In-memory caches, big JVMs, analytics R/X-family High memory-to-vCPU ratio
ML training / inference Trn/Inf (Neuron) or P/G (GPU) Purpose-built accelerators beat general CPU

Storage (PERF 3, data management)

How to do it well. Match the storage service and tier to the access pattern — sequential vs. random, latency vs. throughput, hot vs. cold, shared vs. attached.

Storage need Service / tier Note
Static assets, data lake, backups S3 (class by access) Intelligent-Tiering when pattern is unknown
General DB / boot / app volume EBS gp3 Decouple IOPS+throughput from capacity
High-IOPS, latency-critical DB EBS io2 Block Express Sub-millisecond, consistent IOPS
Large sequential scans (logs, big data) EBS st1 Throughput-optimized HDD, not for random
Shared POSIX across fleet EFS (+ IA tiering) Elastic, multi-AZ; One Zone to save cost
HPC/ML high-throughput scratch FSx for Lustre Links to S3, hundreds of GB/s
Ultra-low-latency hot objects S3 Express One Zone Single-digit-ms, high request rate

Database (PERF 3, data management)

How to do it well. Embrace purpose-built databases — pick the engine by data model and query pattern, not by what the org happens to standardize on. Forcing every dataset into one relational engine is the most common and most expensive Performance Efficiency anti-pattern.

Data / query pattern Purpose-built service Why
Transactional relational, joins Aurora / RDS (Serverless v2, I/O-Optimized) ACID, SQL, read replicas, managed
Massive key-value, predictable single-digit ms DynamoDB (+ DAX) Horizontal scale, on-demand, microsecond cache
Hot read cache / sessions ElastiCache / MemoryDB In-memory microsecond latency
Full-text search, log analytics OpenSearch Service Inverted index, aggregations
BI / data warehouse Redshift (Serverless, Spectrum) Columnar MPP over large datasets + S3
Time-series / IoT telemetry Timestream Built-in tiering, time-series functions
Connected/graph data Neptune Native graph traversal

Network (PERF 4)

How to do it well. Network choices govern latency, throughput, and jitter — often the dominant term in user-perceived performance.

Goal Service / feature Effect
Serve users from nearby PoP CloudFront Edge caching + TLS termination, lower RTT
Reduce jitter for dynamic/TCP/UDP Global Accelerator AWS backbone + anycast, faster failover
Route to nearest healthy endpoint Route 53 latency/geo routing Lower latency, regional steering
Metro-low-latency compute Local Zones / Wavelength Single-digit-ms to end users
High inter-node bandwidth/low tail latency Cluster placement group + ENA Express/EFA Tight, fast east-west traffic
Private, high-throughput service access PrivateLink / VPC endpoints / TGW Off-internet, predictable performance

Artifacts and decisions. A documented architecture decision record (ADR) per major component capturing the chosen resource, the alternatives considered, and the data/criteria behind the choice; a benchmark harness and results (instance families, volume types, DB engines tested against the real access pattern, not a synthetic one); a caching strategy document; and a load-test report establishing baseline throughput and latency at target load. The recurring decision is evidence over inertia: run a one-day experiment (the cloud makes this nearly free) before committing a workload to a resource for years.

Performance review (PERF 5)

What it is. Performance review is the cultural and procedural mechanism for periodically re-examining your architecture against newer AWS capabilities and your own evolving requirements, then re-validating choices with benchmarks and load tests. It is the answer to “the right choice in 2024 may be the wrong choice in 2026” — AWS ships new instance families, storage tiers, and managed services constantly, and your traffic shape changes underneath you. This is the review half of PERF 5 (“How do you evolve your workload to take advantage of new releases?”).

Why it matters. Performance is not a property you set once; it is a property you sustain. Without a deliberate review cadence, workloads quietly drift into the past: still on gp2, still on x86 when Graviton would be 20–40% cheaper and faster, still on a self-managed cache that a managed service now does better. The gap compounds silently because nothing breaks — the system just costs more and runs slower than it should.

How to do it well. Run review on two clocks. A scheduled cadence (e.g., quarterly) where you conduct an AWS Well-Architected Framework Review (WAFR) using the AWS Well-Architected Tool, focused on the Performance Efficiency pillar, and triage the high-risk items (HRIs) it surfaces. And an event-driven trigger: subscribe to AWS What’s New / release notes and the Personal Health Dashboard, and when a relevant release lands (new instance generation, a new storage class, Aurora feature) you open an experiment. Make the review empirical: maintain a repeatable benchmark and load-test harness so re-validation is a button-press, not a project. Use AWS Compute Optimizer and Trusted Advisor performance checks as standing inputs, and infrastructure as code so that adopting a new instance family is a one-line, reversible change you can canary.

Review mechanism Tool / input Output
Pillar self-assessment Well-Architected Tool (WAFR) Prioritized HRIs + improvement plan
Right-sizing signal Compute Optimizer, Trusted Advisor Over/under-provisioned findings
New-capability awareness AWS What’s New, release notes, PHD Candidate experiments
Empirical re-validation Load-test (Distributed Load Testing on AWS) + benchmark harness Pass/fail vs. SLO at target load
Pre-prod safety net CI/CD canary + IaC Reversible, measured rollout

Artifacts and decisions. A completed Well-Architected Tool workload report and its improvement plan; a performance review calendar with owners; a benchmark baseline that every review re-runs; a backlog of adoption experiments tied to specific AWS releases; and an evidence trail (load-test results, before/after metrics) attached to every architecture change. The decision each cycle: which one or two changes have a high enough expected performance/cost return to justify an experiment this quarter.

Monitoring (PERF 5)

What it is. Monitoring is the continuous instrumentation that tells you whether the workload is meeting its performance goals right now, that alerts you before customers feel a regression, and that gives you the evidence to drive every other sub-component. The Framework is explicit: you should monitor performance with active (synthetic) and passive (real-user) telemetry, set thresholds tied to business goals, alarm proactively, and feed the data back into review.

Why it matters. You cannot improve, review, or make a trade-off about what you cannot see. Architecture selection without monitoring is a guess; performance review without monitoring has nothing to review. Crucially, averages lie — a healthy mean latency hides a painful p99. Monitoring at percentiles, end to end (including the network path the customer actually traverses), is what turns “it feels slow” into a precise, actionable signal.

How to do it well. Build a layered observability stack and tie every metric to a goal.

Telemetry type AWS tool Answers
Service & custom metrics CloudWatch (+ agent) Is each component within its threshold?
Compute deep metrics Container/Lambda Insights Where is CPU/memory/throttle pressure?
Distributed tracing X-Ray / Application Signals Which hop is adding latency?
Synthetic (active) CloudWatch Synthetics canaries Is the journey fast from the outside?
Real-user (passive) CloudWatch RUM What do real users in each region see?
SLO / error budget Application Signals SLOs Are we meeting the promise to users?

Artifacts and decisions. A KPI / SLO catalog mapping each user-facing goal to a metric, threshold, and owner; a set of CloudWatch dashboards per service and an executive latency view; an alarm and escalation runbook; canary and RUM coverage of the top user journeys; and a performance baseline captured under known load that future comparisons measure against. The core decision is what “good” means numerically — e.g., “checkout API p99 < 300 ms at 5,000 RPS” — because an unquantified goal cannot be monitored or defended.

Trade-offs and continuous improvement (PERF 5)

What it is. This sub-component is the explicit, documented practice of acknowledging that performance is never free or absolute: you constantly trade it against consistency, durability, cost, latency, space, and time — and you keep iterating as data and technology change. The Framework calls out classic trade-offs (consistency, durability, space vs. time, latency) and pairs them with the experiment more often and evolve your workload principles. It is the synthesis of the other three: selection sets the starting point, monitoring tells you the truth, review schedules the re-think, and trade-off analysis is how you actually decide.

Why it matters. Naive “make it faster” thinking optimizes one axis and silently degrades another. Adding a cache improves latency but introduces a consistency/invalidation problem. Choosing DynamoDB eventual-consistent reads doubles read throughput per cost but may show stale data. Multi-AZ synchronous replication boosts durability but adds write latency. Precomputation trades storage for speed. A team that doesn’t make these trade-offs explicit makes them accidentally — and is then surprised when “the performance fix” causes a correctness incident.

How to do it well. Treat each trade-off as a decision with stated acceptance criteria, measured both before and after.

Trade-off axis You gain You give up AWS lever
Latency vs. consistency Speed, read scale Freshness of data CloudFront/DAX/ElastiCache, read replicas, eventual reads
Space vs. time Faster reads/queries Storage + write cost Materialized views, denormalization, precompute
Durability vs. latency Faster writes Recovery guarantees Async replication, relaxed write quorum
Cost vs. performance Lower spend Headroom / peak speed gp3 vs io2, on-demand vs provisioned, Graviton, Spot
Throughput vs. ordering Parallelism Strict ordering More partitions/shards (Kinesis, DynamoDB, SQS)

Artifacts and decisions. A trade-off register (each decision: axis, choice, accepted cost, acceptance criteria, evidence); A/B / canary results; an improvement backlog in the Well-Architected Tool ranked by expected return; and a post-change performance comparison for every shipped optimization. The discipline: nothing labelled a “performance improvement” merges without naming what it trades away and proving the net result against the SLO.

Real-world enterprise scenario

StreamForge Media is a fictional video-streaming and live-events platform (~450 engineers, 18 million monthly active users across India, the EU, and the US) whose flagship app is suffering: catalog browse p99 has crept to 1.4 s, live-event start-up stalls during traffic spikes, and the analytics warehouse can’t keep up. Their VP of Engineering commissions a Performance Efficiency review aligned to the AWS Well-Architected Framework, to be delivered over two quarters. Here is what they do for each sub-component.

Architecture selection — compute. A WAFR plus Compute Optimizer reveals a fleet of over-provisioned x86 m5 instances at ~22% average CPU. They migrate stateless services to Graviton m7g/c7g on EKS with Karpenter for just-in-time right-sizing, move the spiky live-event ingest webhooks to Lambda on arm64 (with SnapStart for their Java functions), and shift fault-tolerant transcoding batch to C-family Spot. Average utilization rises to ~58%; cold-start p99 on the webhook path drops from 1.8 s to 240 ms.

Architecture selection — storage. Catalog artwork and HLS segments move to S3 with Intelligent-Tiering; the hot “now playing” segment prefixes go to S3 Express One Zone. Every gp2 volume is converted to gp3 (independently provisioning 6,000 IOPS where needed) and the metadata database moves to io2 Block Express. Origin reads drop sharply once CloudFront (HTTP/3) fronts S3 — origin egress falls ~70%.

Architecture selection — database. The “one big PostgreSQL” is decomposed by access pattern: the user session and viewing-progress store moves to DynamoDB on-demand with DAX (read p99 from 40 ms to under 2 ms) and Global Tables for multi-Region; the transactional billing core moves to Aurora PostgreSQL Serverless v2 (I/O-Optimized) with RDS Proxy in front of Lambda; catalog search moves to OpenSearch Service; and the BI workload moves to Redshift Serverless with Spectrum over the S3 data lake. ElastiCache (Valkey) caches the catalog browse response.

Architecture selection — network. Live-event and API traffic is fronted by AWS Global Accelerator (anycast over the AWS backbone) to cut jitter for non-cacheable streams; Route 53 latency-based routing steers users to the nearest of three Regions; Local Zones in Mumbai and Frankfurt shave metro latency; and inter-service east-west traffic uses PrivateLink plus a cluster placement group with ENA Express for the transcoding pipeline.

Performance review. They establish a quarterly WAFR in the Well-Architected Tool (Performance Efficiency pillar) with named HRI owners, subscribe the platform team to AWS What’s New, and stand up a repeatable load-test harness using Distributed Load Testing on AWS. Compute Optimizer and Trusted Advisor feed a standing right-sizing backlog. Adopting a new instance generation is now a one-line IaC change behind a canary.

Monitoring. They define an SLO catalog (“browse API p99 < 300 ms at 8,000 RPS”, “live start-up p95 < 2 s”) in CloudWatch Application Signals, instrument distributed tracing with X-Ray, add Container Insights and Lambda Insights, deploy CloudWatch Synthetics canaries for the top five journeys, and turn on CloudWatch RUM to see real Core Web Vitals by region. Alarms use anomaly detection and route through EventBridge to PagerDuty.

Trade-offs and continuous improvement. They keep a trade-off register: the catalog cache accepts a 60-second staleness budget (documented invalidation on publish); viewing-progress uses DynamoDB eventual-consistent reads (accepting brief staleness for 2x read throughput) but strong reads on the resume-playback call; billing keeps synchronous Aurora replication (durability over a few ms of write latency). Each optimization ships behind a canary with before/after CloudWatch evidence, and the backlog is ranked by price-performance return in the Well-Architected Tool.

Measurable outcome. Within two quarters: catalog browse p99 falls from 1.4 s to 220 ms; live-event start-up p95 from 4.1 s to 1.6 s; session-store read p99 from 40 ms to under 2 ms (DAX); fleet CPU utilization from 22% to ~58%; and compute price-performance improves roughly 35% on the Graviton-migrated tier — all while the Well-Architected Tool’s Performance Efficiency high-risk items drop from 14 to 1.

Deliverables & checklist

Common pitfalls

What’s next

Part 5 of the AWS Well-Architected Framework series turns to the Cost Optimization pillar — practicing cloud financial management, expenditure and usage awareness, selecting cost-effective resources, managing supply against demand, and optimizing over time.

AWSWell-ArchitectedPerformance EfficiencyEnterprise
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments

// part 4 of 6 · AWS Well-Architected Framework

Keep Reading