Storage is where most “the database is slow” tickets actually end. Teams provision a volume by capacity, pick a type from muscle memory, and never look at the throughput ceiling the instance imposes underneath it. The result is a 16,000-IOPS volume bolted to an instance that can only push 4,750 — money spent on numbers the kernel can never reach. This guide is the mental model and the concrete knobs I use to size and tune block and file storage on AWS: what each EBS type is actually for, how gp3 and io2 decouple IOPS, throughput, and capacity, where the instance becomes the bottleneck, and how EFS throughput modes change the calculus for shared file workloads. Everything here is verifiable with fio and CloudWatch — I’ll show both.
1. EBS volume types by workload
There are four types worth provisioning in 2026. Pick by access pattern, not by habit.
| Type | Media | Best for | Max IOPS / vol | Max throughput / vol |
|---|---|---|---|---|
gp3 |
SSD | General purpose; boot, most apps, mid-tier DBs | 16,000 | 1,000 MiB/s |
io2 Block Express |
SSD | Latency-sensitive, high-IOPS DBs; sub-ms, durable | 256,000 | 4,000 MiB/s |
st1 |
HDD | Large sequential, throughput-bound (logs, big-data scans) | 500 | 500 MiB/s |
sc1 |
HDD | Cold, infrequently accessed, lowest cost | 250 | 250 MiB/s |
The decision rules I apply:
- Default to
gp3. It is cheaper than the legacygp2for the same baseline and lets you buy IOPS and throughput independently of size. There is almost no reason to provisiongp2on a new system. - Reach for
io2Block Express only when you need it: sustained IOPS above 16,000, single-digit-millisecond p99 latency under load, durability of 99.999%, or volumes larger than 16 TiB. Block Express is the substrate that unlocks the high ceilings and is available on Nitro instances. st1/sc1are HDD and throughput-optimized, not IOPS devices. They are excellent for streaming reads of large files and terrible for random small I/O or as a boot volume — you cannot boot from them.st1uses a throughput burst-credit model;sc1is the cold, cheapest tier.
Rule of thumb: if the workload is random and small-block (databases, busy filesystems), it is an IOPS problem -> SSD (
gp3/io2). If it is large and sequential (log ingestion, analytics scans), it is a throughput problem -> considerst1, but measure, because a well-provisionedgp3at 1,000 MiB/s often wins on latency.
2. Decoupling IOPS, throughput, and capacity
The single most useful property of gp3 and io2 is that the three dimensions are separately provisionable. On gp2, IOPS scaled with size (3 IOPS/GiB), so you used to oversize a volume just to buy performance. That coupling is gone.
gp3 baseline is 3,000 IOPS and 125 MiB/s at any size, and you provision above that up to 16,000 IOPS and 1,000 MiB/s. The throughput ceiling you can buy also scales with provisioned IOPS — you get up to 0.25 MiB/s per IOPS, so 1,000 MiB/s requires at least 4,000 provisioned IOPS.
resource "aws_ebs_volume" "data" {
availability_zone = "us-east-1a"
size = 200 # GiB, sized for capacity only
type = "gp3"
iops = 8000 # decoupled from size
throughput = 500 # MiB/s, decoupled from size
encrypted = true
kms_key_id = aws_kms_key.ebs.arn
}
For io2, you provision IOPS directly, bounded by a ratio of IOPS to capacity (up to 1,000 IOPS/GiB), and Block Express raises the per-volume ceiling to 256,000 IOPS and 4,000 MiB/s:
resource "aws_ebs_volume" "oltp" {
availability_zone = "us-east-1a"
size = 500
type = "io2" # Block Express on supported Nitro instances
iops = 64000 # within the 1000 IOPS/GiB ratio (500 GiB -> up to 500k)
encrypted = true
}
Modifying a volume in place is online via Elastic Volumes — no detach, no downtime:
aws ec2 modify-volume \
--volume-id vol-0abc123 \
--volume-type gp3 \
--iops 10000 \
--throughput 700
# Watch the modification progress; the volume stays attached and usable
aws ec2 describe-volumes-modifications --volume-id vol-0abc123 \
--query 'VolumesModifications[0].[ModificationState,Progress]' --output text
Two operational caveats that bite people: after a modification completes the volume enters an optimizing state where performance is between old and new for a while, and a given volume can only be modified once every 6 hours. Plan changes; don’t thrash them.
3. The instance bandwidth ceiling
This is the section that saves the most money. A volume’s provisioned numbers are a maximum the volume can do — the instance imposes its own EBS bandwidth and IOPS limits, and those are usually lower. AWS publishes per-instance “EBS-optimized” limits: a baseline and a 30-minute burst (on smaller sizes), measured at a 16 KiB block size.
Concretely: an m6i.large tops out around 10,000 IOPS and 4,750 Mbps (~594 MiB/s) of dedicated EBS bandwidth. Attaching a single gp3 provisioned for 16,000 IOPS and 1,000 MiB/s to that instance is wasted spend — the instance caps you at roughly 60% of the throughput and 62% of the IOPS you paid for. The fix is to size the instance to the storage need, or aggregate volumes when the instance has headroom.
Check the limits before you provision the volume:
aws ec2 describe-instance-types \
--instance-types m6i.large m6i.4xlarge \
--query 'InstanceTypes[].{type:InstanceType, \
baseIOPS:EbsInfo.EbsOptimizedInfo.BaselineIops, \
burstIOPS:EbsInfo.EbsOptimizedInfo.MaximumIops, \
baseMBps:EbsInfo.EbsOptimizedInfo.BaselineThroughputInMBps, \
burstMBps:EbsInfo.EbsOptimizedInfo.MaximumThroughputInMBps}' \
--output table
Smaller instances get an unlimited-duration baseline plus a burst bucket; the larger sizes in a family deliver their maximum continuously. If your workload is sustained (a busy database), size against the baseline, not the burst, or you will fall off a cliff after 30 minutes. On modern instances EBS optimization is on by default and not billable; on older types you may still need --ebs-optimized.
When one instance has bandwidth headroom but a single volume’s per-volume ceiling is the limit, stripe. A RAID 0 across N gp3 volumes multiplies the volume ceilings — up to the instance limit:
# Two gp3 volumes, each provisioned for high throughput, striped
sudo mdadm --create /dev/md0 --level=0 --raid-devices=2 /dev/nvme1n1 /dev/nvme2n1
sudo mkfs.xfs /dev/md0
sudo mount /dev/md0 /data
RAID 0 gives no redundancy — rely on EBS’s own durability and snapshots, and know that a snapshot of a striped set is not crash-consistent across members unless you freeze the filesystem first.
4. Multi-Attach, fast snapshot restore, and snapshot lifecycle
Multi-Attach lets a single io2 (or io1) volume attach to up to 16 Nitro instances in the same AZ concurrently. It is not a magic shared disk — it provides no coordination. You must run a cluster-aware filesystem (GFS2, OCFS2) or an application that arbitrates writes; mounting xfs/ext4 read-write on two instances corrupts the volume. Use it for clustered, fence-aware software, not as a poor man’s EFS.
Fast Snapshot Restore (FSR) removes the lazy-load penalty. Normally a volume restored from a snapshot loads blocks from S3 on first touch, so the first read of each block is slow. FSR pre-initializes the volume so it delivers full provisioned performance immediately — essential for golden-image boot volumes and for restoring large data volumes into service quickly. It is billed per AZ per hour while enabled.
aws ec2 enable-fast-snapshot-restores \
--availability-zones us-east-1a us-east-1b \
--source-snapshot-ids snap-0abc123
Automate retention with Data Lifecycle Manager rather than cron jobs and Lambda glue. A policy that snapshots nightly, keeps 14, and copies to a DR Region:
{
"ResourceTypes": ["VOLUME"],
"TargetTags": [{ "Key": "Backup", "Value": "daily" }],
"Schedules": [
{
"Name": "daily-14d",
"CreateRule": { "Interval": 24, "IntervalUnit": "HOURS", "Times": ["03:00"] },
"RetainRule": { "Count": 14 },
"CopyTags": true,
"CrossRegionCopyRules": [
{
"TargetRegion": "us-west-2",
"Encrypted": true,
"CmkArn": "arn:aws:kms:us-west-2:111122223333:key/abcd-1234",
"RetainRule": { "Interval": 14, "IntervalUnit": "DAYS" }
}
]
}
]
}
EBS snapshots are incremental and block-level: only changed blocks since the last snapshot are stored, so frequent snapshots are cheap. Deleting an old snapshot never breaks a newer one — AWS re-references the blocks the newer snapshot still needs.
5. EFS performance modes, throughput modes, and elastic throughput
EFS is NFSv4.1, multi-AZ, and grows automatically. Its performance is governed by two orthogonal settings that people routinely confuse.
Performance mode (set at creation, immutable):
- General Purpose — lowest per-operation latency. The right default; required for latency-sensitive and most interactive workloads. Use this unless proven otherwise.
- Max I/O — higher aggregate throughput and IOPS by trading away latency. AWS now steers nearly everyone to General Purpose with Elastic throughput; Max I/O is a legacy choice for massively parallel, latency-tolerant jobs.
Throughput mode (changeable, subject to a cooldown):
- Elastic — throughput scales automatically with demand, up to high regional limits (on the order of GiB/s for reads), and you pay only for the data transferred. This is the default I recommend for spiky or unpredictable workloads; no provisioning, no cliffs.
- Provisioned — you set a fixed throughput independent of stored size. Use it when you have a steady, known high throughput need on a small filesystem, where Elastic’s per-request pricing would cost more.
- Bursting — throughput scales with stored data (baseline 50 KiB/s per GiB) and earns burst credits. Cheap, but a small filesystem starves; this is why so many EFS performance complaints trace back to a near-empty Bursting filesystem that ran out of credits.
resource "aws_efs_file_system" "shared" {
encrypted = true
performance_mode = "generalPurpose"
throughput_mode = "elastic" # scales automatically, pay-per-use
lifecycle_policy {
transition_to_ia = "AFTER_30_DAYS"
transition_to_primary_storage_class = "AFTER_1_ACCESS"
}
}
Switching to Provisioned for a steady high-throughput job:
aws efs update-file-system \
--file-system-id fs-0abc123 \
--throughput-mode provisioned \
--provisioned-throughput-in-mibps 256
Throughput-mode changes and decreases in provisioned throughput are rate-limited (you can raise it, but reducing it or switching modes has a cooldown of roughly a day), so don’t treat it as an autoscaling knob.
6. EFS storage classes, lifecycle, and access points
EFS has Standard and Infrequent Access (IA) classes (plus One Zone variants for single-AZ cost savings). Lifecycle management moves files between Standard and IA based on access age; the transition_to_primary_storage_class = "AFTER_1_ACCESS" rule above promotes a file back to Standard the moment it is read again, which avoids the IA per-access read charge punishing hot files that aged out. For most shared filesystems IA cuts storage cost substantially with negligible behavioral change, because access is Pareto-distributed.
Access points are the right way to hand EFS to multiple applications or containers. Each enforces a POSIX identity and a root directory, so an app physically cannot see another tenant’s files:
resource "aws_efs_access_point" "app_a" {
file_system_id = aws_efs_file_system.shared.id
posix_user {
uid = 1000
gid = 1000
}
root_directory {
path = "/app-a"
creation_info {
owner_uid = 1000
owner_gid = 1000
permissions = "0750"
}
}
}
Pair access points with a filesystem policy that requires TLS and IAM authorization, so a leaked mount target is useless without credentials:
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Deny",
"Principal": { "AWS": "*" },
"Action": "*",
"Resource": "*",
"Condition": { "Bool": { "aws:SecureTransport": "false" } }
}]
}
Mount with the EFS helper so encryption-in-transit and the access point are wired correctly:
sudo mount -t efs -o tls,accesspoint=fsap-0abc123 fs-0abc123:/ /mnt/app-a
7. Benchmarking with fio and interpreting results
Never trust the spec sheet — measure the path you actually run. fio is the tool. Match the block size and pattern to your workload: 16 KiB random for database-like I/O, large sequential for streaming.
Random read IOPS (database-style), with O_DIRECT to bypass the page cache so you measure the device, not RAM:
sudo fio --name=randread --filename=/data/fiotest --direct=1 \
--rw=randread --bs=16k --iodepth=64 --numjobs=4 --group_reporting \
--size=10G --runtime=120 --time_based --ioengine=libaio
Sequential throughput (analytics/log-streaming style):
sudo fio --name=seqread --filename=/data/fiotest --direct=1 \
--rw=read --bs=1M --iodepth=32 --numjobs=2 --group_reporting \
--size=20G --runtime=120 --time_based --ioengine=libaio
Reading the output:
- IOPS — compare against
min(volume provisioned IOPS, instance IOPS limit). If you fall short of both, the bottleneck is elsewhere (filesystem, single-threaded I/O, too-shallow queue depth). - bw (bandwidth) — compare against
min(volume throughput, instance EBS bandwidth). Hitting the instance number and not the volume’s confirms you are instance-bound; that’s your signal to resize the instance, not the volume. - clat / lat percentiles —
gp3typically lands around single-digit-millisecond latency;io2Block Express is sub-millisecond. A p99 far above the median under load means queueing — usually iodepth or numjobs higher than the device can absorb. Latency is the metric users feel; watch the percentiles, not the average.
A fresh volume restored from snapshot without FSR will read slow on first touch — that is lazy loading, not the steady-state number. Either enable FSR or pre-warm by reading every block before you benchmark.
Verify
Confirm the storage is performing to the limit that actually applies, end to end.
# 1. Confirm provisioned volume settings took effect
aws ec2 describe-volumes --volume-ids vol-0abc123 \
--query 'Volumes[0].{type:VolumeType,size:Size,iops:Iops,throughput:Throughput,state:State}'
# 2. Confirm the instance's EBS ceiling (the real cap)
aws ec2 describe-instance-types --instance-types m6i.large \
--query 'InstanceTypes[0].EbsInfo.EbsOptimizedInfo'
# 3. Measure actual achieved performance against CloudWatch
aws cloudwatch get-metric-statistics --namespace AWS/EBS \
--metric-name VolumeReadOps --dimensions Name=VolumeId,Value=vol-0abc123 \
--start-time "$(date -u -v-1H '+%Y-%m-%dT%H:%M:%SZ')" \
--end-time "$(date -u '+%Y-%m-%dT%H:%M:%SZ')" \
--period 300 --statistics Sum
# 4. Check whether the instance is throttling EBS (Nitro burst-balance / throughput)
# A persistently low VolumeThroughputPercentage or exhausted BurstBalance == bottleneck found
aws cloudwatch get-metric-statistics --namespace AWS/EBS \
--metric-name VolumeThroughputPercentage --dimensions Name=VolumeId,Value=vol-0abc123 \
--start-time "$(date -u -v-1H '+%Y-%m-%dT%H:%M:%SZ')" \
--end-time "$(date -u '+%Y-%m-%dT%H:%M:%SZ')" --period 300 --statistics Average
For EFS, confirm throughput mode and watch the burst/IO limit percentage:
aws efs describe-file-systems --file-system-id fs-0abc123 \
--query 'FileSystems[0].{mode:ThroughputMode,prov:ProvisionedThroughputInMibps,perf:PerformanceMode}'
# PercentIOLimit near 100 on General Purpose means you should consider Elastic/Max I/O
aws cloudwatch get-metric-statistics --namespace AWS/EFS \
--metric-name PercentIOLimit --dimensions Name=FileSystemId,Value=fs-0abc123 \
--start-time "$(date -u -v-1H '+%Y-%m-%dT%H:%M:%SZ')" \
--end-time "$(date -u '+%Y-%m-%dT%H:%M:%SZ')" --period 300 --statistics Maximum
Enterprise scenario
A fintech platform team ran a PostgreSQL fleet on r5.2xlarge instances, each with a single 4 TiB gp3 volume provisioned to the full 16,000 IOPS and 1,000 MiB/s. Their batch reconciliation job — a heavy nightly read-write pass — consistently flatlined at roughly 600 MiB/s no matter how high they pushed the volume’s provisioned throughput, and p99 query latency spiked into the seconds during the window. The on-call instinct was “buy more IOPS,” and they had, twice, with no effect on the spend going up.
The constraint was the instance, not the volume. An r5.2xlarge delivers a baseline of about 593.75 MiB/s (4,750 Mbps) of EBS throughput — almost exactly the ceiling they kept hitting. The volume was provisioned 68% beyond anything the instance could ever consume. They were paying for 1,000 MiB/s and physically capped at ~594.
Two changes fixed it. They moved the database to r6i.4xlarge, which delivers a sustained ~1,187.5 MiB/s baseline, and they migrated the hottest volumes to io2 Block Express for the latency floor under concurrent load. They also right-sized the volume’s provisioned throughput down to match the new instance baseline, recovering the over-provisioning spend. They codified the rule so it can’t regress: provisioned volume throughput must never exceed the instance’s published EBS baseline.
# Guardrail: cap provisioned throughput at the instance's EBS baseline.
# Fetch the instance EBS baseline at plan time and clamp the volume to it.
data "aws_ec2_instance_type" "db" {
instance_type = "r6i.4xlarge"
}
locals {
instance_ebs_baseline_mibps = data.aws_ec2_instance_type.db.ebs_optimized_info[0].baseline_throughput_in_mbps
}
resource "aws_ebs_volume" "pg_data" {
availability_zone = "us-east-1a"
size = 4096
type = "io2"
iops = 64000
# Provisioning beyond the instance baseline is wasted money; clamp it.
throughput = min(1000, local.instance_ebs_baseline_mibps)
encrypted = true
}
The reconciliation window dropped from 50 minutes to 22, p99 latency fell back under 10 ms, and the monthly storage bill went down because the over-provisioned IOPS were trimmed. The lesson the team internalized: storage performance is min(volume, instance), and the instance limit is the one nobody checks first.