AWS Lesson 19 of 123

Tuning Block and File Storage on AWS: EBS gp3/io2, EFS Throughput Modes, and Workload-Driven Sizing

Storage is where most “the database is slow” tickets actually end. Teams provision a volume by capacity, pick a type from muscle memory, and never look at the throughput ceiling the instance imposes underneath it. The result is a 16,000-IOPS volume bolted to an instance that can only push 4,750 — money spent on numbers the kernel can never reach. This is the most expensive misunderstanding in AWS storage, and it is invisible: the volume reports its full provisioned numbers, CloudWatch shows you well under them, and nobody connects the two because the limit that bit you lives on the instance, not the volume.

This guide is the mental model and the concrete knobs I use to size and tune block and file storage on AWS: what each EBS type is actually for, how gp3 and io2 Block Express decouple IOPS, throughput, and capacity, where the instance becomes the bottleneck, and how EFS throughput modes change the calculus for shared file workloads. The governing equation is one line — achieved performance = min(volume limit, instance limit, filesystem/app limit) — and everything in this article is an elaboration of where each of those three terms comes from and how to read it off a real system. Everything here is verifiable with fio and CloudWatch, and I show both. Because this is a reference you will return to while sizing a fleet or chasing a latency spike, the volume types, the limits, the throughput modes, and the failure modes are all laid out as scannable tables — read the prose once, then keep the tables open.

By the end you will stop sizing storage by capacity alone. When a workload is slow you will know whether you face a volume ceiling, an instance EBS-bandwidth cap, a near-empty EFS Bursting filesystem out of credits, a too-shallow queue depth hiding real headroom, or a snapshot that reads slow only because nobody enabled Fast Snapshot Restore. Knowing which in five minutes — from two CloudWatch metrics and one fio run — is what separates a right-sized fleet from a bill full of numbers the hardware can never deliver.

What problem this solves

EBS and EFS hide enormous machinery so you can attach a disk and run. That abstraction is a gift until performance matters, then the defaults and the muscle-memory choices cost you twice: once in latency users feel, once in spend on capacity and IOPS the instance can never consume. The pain is concrete — a reconciliation batch that flatlines at 600 MiB/s no matter how high you push the volume, a shared filesystem that crawls on a near-empty Bursting EFS, a restored DR volume that reads at a tenth of its rated speed for the first hour, a gp2 boot volume silently throttled because someone never migrated it to gp3.

What breaks without this knowledge: engineers “buy more IOPS” (no effect, because the instance was the cap), oversize volumes to chase performance the old gp2-era way (3 IOPS/GiB coupling that no longer applies), pick io2 Block Express for a workload that gp3 would serve at a quarter of the price, or mount EFS with the wrong throughput mode and watch a filesystem starve. Meanwhile the actual constraint — the instance’s published EBS baseline, an exhausted burst-credit bucket, a single-threaded I/O pattern that a deeper queue would saturate — sits there, perfectly measurable, ignored.

Who hits this: anyone running databases (random small-block, IOPS-bound), analytics and log pipelines (large sequential, throughput-bound), container fleets sharing EFS, and DR/golden-image workflows that restore from snapshots. It bites hardest on right-sizing reviews (where over-provisioned volumes hide in plain sight), on latency-sensitive OLTP under concurrency (where gp3’s ceiling or queueing shows up), and on cost audits (where the gap between provisioned and achieved is real money). The fix is almost never “a bigger volume” — it’s “find the term in min(volume, instance, app) that’s actually binding and move that one.”

To frame the whole field before the deep dive, here is every performance-limit class this article covers, the question it forces, and the one place to look first:

Limit class What it caps First question to ask First place to look Most common single cause
Volume per-volume ceiling One volume’s max IOPS / throughput Am I at the volume’s rated max? describe-volumes (Iops, Throughput) gp3 left at 3,000/125 defaults
Instance EBS bandwidth All EBS traffic from the instance Does the instance cap below the volume? describe-instance-types EbsOptimizedInfo Big volume on a small instance
gp3 throughput-per-IOPS ratio Throughput you can buy vs IOPS Did I provision enough IOPS to buy the MiB/s? Provisioned iops vs throughput 1,000 MiB/s needs ≥4,000 IOPS
EFS throughput mode Filesystem aggregate throughput Bursting on a near-empty filesystem? describe-file-systems ThroughputMode Bursting starves below ~1 TiB
Queue depth / parallelism Achievable IOPS at the app Is iodepth/numjobs deep enough? fio iodepth vs achieved IOPS Single-threaded I/O, iodepth=1
Snapshot lazy-load First-touch read speed Is this a fresh restore without FSR? First-read latency vs steady state Restore without Fast Snapshot Restore

Learning objectives

By the end of this article you can:

Prerequisites & where this fits

You should already understand the basics: an EBS volume is network-attached block storage bound to one Availability Zone and (normally) one EC2 instance; EFS is an NFSv4.1 file system reachable from many instances across AZs; and an EC2 instance is the compute that mounts them. You should be comfortable running aws CLI with --query, reading JSON output, and reading a Terraform resource block. Familiarity with Linux filesystems (mkfs, mount, /dev/nvme*), the page cache, and basic IOPS-vs-throughput-vs-latency vocabulary helps.

This sits in the Compute & Storage track. It assumes the EC2 fundamentals from Amazon EC2, In Depth: Instance Types, AMIs, EBS, User Data, IMDS & Every Launch Option, and it is the performance-and-tuning companion to the breadth survey in AWS Block & File Storage, In Depth: EBS, EFS, FSx & Instance Store. It pairs with AWS Observability, In Depth: CloudWatch, CloudTrail, Config & EventBridge because every limit here is read off a CloudWatch metric, and with Amazon RDS & Aurora, In Depth: Engines, Multi-AZ, Read Replicas, Backups & Every Option, whose managed storage abstracts the same physics you tune by hand here.

A quick map of who owns which limit during a sizing review or an incident, so you reason about the right layer:

Layer What lives here Who usually owns it Performance class it can cause
Application / DB engine Block size, queue depth, fsync pattern App / DBA Queue-starved IOPS; fsync-bound latency
Filesystem / RAID xfs/ext4, mdadm stripe, mount opts Platform / SRE Single-volume ceiling when stripe absent
EBS volume Type, provisioned IOPS/throughput Platform Volume per-volume ceiling
EC2 instance EBS-optimized baseline/burst Platform Instance EBS-bandwidth cap (the silent one)
EFS file system Performance + throughput mode Platform Credit starvation; mode mismatch
Snapshot / DLM FSR, lifecycle, incremental chain Platform / Backup Lazy-load slow first touch

Core concepts

Five mental models make every later decision obvious.

Achieved performance is the minimum of several ceilings, not the volume’s number. The volume’s provisioned IOPS and throughput are a maximum the volume can do. The instance imposes its own EBS-optimized bandwidth and IOPS limit, and that is usually lower. The filesystem and application impose a third (block size, queue depth, fsync). What you actually get is min(volume, instance, app). Almost every “we paid for performance we don’t see” story is the volume number being the largest of the three while the instance or the app is the one binding.

IOPS, throughput, and capacity are three separate purchases on modern volumes. On the legacy gp2, IOPS scaled with size (3 IOPS/GiB), so you oversized a volume just to buy performance. gp3 and io2 break that coupling: you set capacity for how much data you store, IOPS for how many small operations per second, and throughput (MiB/s) for how much sequential bandwidth — independently, within ratios. Sizing storage is now three decisions, and conflating them is how you both overspend and under-provision at once.

Random-small is an IOPS problem; large-sequential is a throughput problem. Databases and busy filesystems do many tiny (4–16 KiB) random operations — that is an IOPS workload, and it wants SSD (gp3/io2). Log ingestion and analytics scans move large blocks sequentially — that is a throughput workload, where HDD st1 can be cost-effective, though a well-provisioned gp3 at 1,000 MiB/s often wins on latency. Naming the workload (random-small vs large-sequential) is the first fork in choosing a type.

EFS performance is governed by two orthogonal settings people routinely confuse. Performance mode (General Purpose vs Max I/O, immutable after creation) trades latency against aggregate ceiling. Throughput mode (Elastic, Provisioned, Bursting, changeable with a cooldown) governs how much aggregate throughput you get and how you pay. The classic EFS failure is a near-empty Bursting filesystem: throughput scales with stored data (50 KiB/s per GiB baseline), so a 100 GiB filesystem has a tiny baseline and starves once its burst credits run out.

Snapshots are incremental, and a fresh restore is lazy-loaded. EBS snapshots store only changed blocks since the last snapshot, so frequent snapshots are cheap and deleting an old one never breaks a newer one. But a volume restored from a snapshot loads each block from S3 on first touch, so the first read of every block is slow — that is lazy loading, not the steady-state number. Fast Snapshot Restore (FSR) pre-initializes the volume so it delivers full performance immediately. Benchmark a fresh restore without FSR and you measure S3 fetch latency, not the volume.

The vocabulary in one table

Before the deep sections, pin down every moving part. The glossary at the end repeats these for lookup; this table is the mental model side by side:

Concept One-line definition Where it lives Why it matters to performance
gp3 General-purpose SSD; IOPS/throughput decoupled from size EBS volume type The default; cheaper than gp2, tunable
io2 Block Express High-IOPS, sub-ms, durable SSD EBS volume type Only when gp3’s ceiling isn’t enough
st1 / sc1 Throughput-optimized / cold HDD EBS volume type Large sequential, never random/boot
Provisioned IOPS Small ops/sec you buy for the volume Volume setting Caps random-small performance
Provisioned throughput MiB/s you buy for the volume Volume setting Caps sequential bandwidth
Instance EBS baseline Sustained EBS bandwidth the instance allows Instance attribute The real cap nobody checks first
EBS-optimized burst 30-min higher bandwidth on smaller sizes Instance attribute Misleads you if workload is sustained
Elastic Volumes Online modify of type/IOPS/throughput EBS feature Change without downtime; 6 h cooldown
RAID 0 stripe Aggregate N volumes’ ceilings Filesystem (mdadm) Beat single-volume ceiling; no redundancy
EFS performance mode General Purpose vs Max I/O EFS (immutable) Latency vs aggregate ceiling
EFS throughput mode Elastic / Provisioned / Bursting EFS (cooldown) How much throughput + how you pay
Burst credits Earned headroom on st1 / EFS Bursting Volume/FS state Starve when exhausted → slow
FSR Fast Snapshot Restore (pre-init) Snapshot feature Full speed on first touch after restore

EBS volume types by workload

There are four types worth provisioning in 2026. Pick by access pattern, not by habit. The per-volume ceilings are the volume’s maximum — the instance ceiling (next section) is often what actually binds.

Type Media Best for Max IOPS / vol Max throughput / vol Boot?
gp3 SSD General purpose; boot, most apps, mid-tier DBs 16,000 1,000 MiB/s Yes
io2 Block Express SSD Latency-sensitive, high-IOPS DBs; sub-ms, durable 256,000 4,000 MiB/s Yes
st1 HDD Large sequential, throughput-bound (logs, big-data scans) 500 500 MiB/s No
sc1 HDD Cold, infrequently accessed, lowest cost 250 250 MiB/s No

The full attribute grid — every type side by side on the dimensions that decide a pick, including the legacy gp2/io1 you’ll meet on existing fleets:

Attribute gp3 io2 Block Express gp2 (legacy) io1 (legacy) st1 sc1
Media SSD SSD SSD SSD HDD HDD
Min / max size 1 GiB – 16 TiB 4 GiB – 64 TiB 1 GiB – 16 TiB 4 GiB – 16 TiB 125 GiB – 16 TiB 125 GiB – 16 TiB
Max IOPS / volume 16,000 256,000 16,000 64,000 500 250
Max throughput / volume 1,000 MiB/s 4,000 MiB/s 250 MiB/s 1,000 MiB/s 500 MiB/s 250 MiB/s
Baseline 3,000 / 125 MiB/s (you provision) 3 IOPS/GiB (coupled) (you provision) credit-based credit-based
IOPS:capacity ratio ≤ 500 IOPS/GiB ≤ 1,000 IOPS/GiB 3 IOPS/GiB ≤ 50 IOPS/GiB n/a n/a
Durability 99.8–99.9% 99.999% 99.8–99.9% 99.8–99.9% 99.8–99.9% 99.8–99.9%
Bootable Yes Yes Yes Yes No No
Multi-Attach No Yes (≤16) No Yes (≤16) No No
Best for Default; most apps Sub-ms / > 16k IOPS (migrate to gp3) (migrate to io2) Sequential Cold

The decision rules I apply:

Rule of thumb: if the workload is random and small-block (databases, busy filesystems), it is an IOPS problem -> SSD (gp3/io2). If it is large and sequential (log ingestion, analytics scans), it is a throughput problem -> consider st1, but measure, because a well-provisioned gp3 at 1,000 MiB/s often wins on latency.

Picking by the numbers — a decision table

When the workload is described in plain terms, this maps it to a type without debate:

If the workload is… It’s probably… Provision Why
Boot/root volume, mixed app I/O General-purpose gp3 (3,000/125 default) Cheapest sane default; bootable
OLTP DB, random 4–16 KiB, < 16,000 IOPS IOPS-bound, moderate gp3 with raised IOPS Decoupled IOPS, fraction of io2 cost
OLTP DB needing > 16,000 IOPS or sub-ms p99 IOPS-bound, extreme io2 Block Express Only type that exceeds gp3’s ceiling
Volume > 16 TiB with high IOPS Large + high-IOPS io2 Block Express gp3 caps at 16 TiB / 16,000 IOPS
Log/stream ingestion, large sequential writes Throughput-bound st1 (or gp3 @ 1,000) HDD cheap for sequential; measure latency
Cold archive on a block device, rare reads Cost-floor sc1 Lowest $/GiB block tier
Shared across many instances / AZs File, not block EFS (not EBS) EBS is single-AZ, single-attach by default

What each type costs you to get wrong

The mis-picks I see most, and what they cost:

Mistake Looks like Actual cost Correct move
gp2 on a new system “It’s always worked” Pays more for less; IOPS coupled to size Migrate to gp3 (online)
io2 for a gp3 workload Over-engineered DB volume 3–5× the price for unused ceiling Right-size to gp3 with provisioned IOPS
st1/sc1 for random I/O Terrible DB latency HDD seeks kill small random ops SSD (gp3/io2)
gp3 left at 3,000/125 default “Why is it slow?” Throttled to baseline despite headroom Raise provisioned IOPS/throughput
HDD as a boot volume Won’t boot Hard failure gp3 for root

Decoupling IOPS, throughput, and capacity

The single most useful property of gp3 and io2 is that the three dimensions are separately provisionable. On gp2, IOPS scaled with size (3 IOPS/GiB), so you used to oversize a volume just to buy performance. That coupling is gone.

gp3 baseline is 3,000 IOPS and 125 MiB/s at any size, and you provision above that up to 16,000 IOPS and 1,000 MiB/s. The throughput ceiling you can buy also scales with provisioned IOPS — you get up to 0.25 MiB/s per IOPS, so 1,000 MiB/s requires at least 4,000 provisioned IOPS.

resource "aws_ebs_volume" "data" {
  availability_zone = "us-east-1a"
  size              = 200    # GiB, sized for capacity only
  type              = "gp3"
  iops              = 8000   # decoupled from size
  throughput        = 500    # MiB/s, decoupled from size
  encrypted         = true
  kms_key_id        = aws_kms_key.ebs.arn
}

For io2, you provision IOPS directly, bounded by a ratio of IOPS to capacity (up to 1,000 IOPS/GiB), and Block Express raises the per-volume ceiling to 256,000 IOPS and 4,000 MiB/s:

resource "aws_ebs_volume" "oltp" {
  availability_zone = "us-east-1a"
  size              = 500
  type              = "io2"      # Block Express on supported Nitro instances
  iops              = 64000      # within the 1000 IOPS/GiB ratio (500 GiB -> up to 500k)
  encrypted         = true
}

The dimensions and their ratios, side by side

Every provisionable dimension, its range, its default, and the ratio that bounds it:

Dimension gp3 range gp3 default io2 range Ratio / bound Gotcha
Capacity (size) 1 GiB – 16 TiB (you set) 4 GiB – 64 TiB io2: IOPS ≤ 1,000 × GiB Shrinking size is not supported online
Provisioned IOPS 3,000 – 16,000 3,000 100 – 256,000 (Block Express) gp3: ≤ 500 IOPS/GiB Above 16,000 needs io2, not gp3
Provisioned throughput 125 – 1,000 MiB/s 125 up to 4,000 MiB/s (Block Express) gp3: ≤ 0.25 MiB/s per IOPS 1,000 MiB/s needs ≥ 4,000 IOPS
Throughput-per-IOPS derived derived derived gp3 hard rule Buying MiB/s without IOPS is rejected
Durability 99.8–99.9% 99.999% io2 is the durability tier

The gp3 throughput-per-IOPS trap

This catches people who raise throughput without raising IOPS. To buy a given throughput on gp3, you need at least throughput / 0.25 provisioned IOPS:

Target throughput Minimum gp3 IOPS required Why
125 MiB/s (baseline) 3,000 (baseline) Free with baseline
250 MiB/s 1,000 (covered by 3,000 baseline) Within baseline IOPS
500 MiB/s 2,000 (covered by 3,000 baseline) Within baseline IOPS
750 MiB/s 3,000 Exactly at baseline IOPS
1,000 MiB/s 4,000 Must raise IOPS above baseline

Modifying a volume online with Elastic Volumes

Modifying a volume in place is online via Elastic Volumes — no detach, no downtime:

aws ec2 modify-volume \
  --volume-id vol-0abc123 \
  --volume-type gp3 \
  --iops 10000 \
  --throughput 700

# Watch the modification progress; the volume stays attached and usable
aws ec2 describe-volumes-modifications --volume-id vol-0abc123 \
  --query 'VolumesModifications[0].[ModificationState,Progress]' --output text

Two operational caveats that bite people: after a modification completes the volume enters an optimizing state where performance is between old and new for a while, and a given volume can only be modified once every 6 hours. Plan changes; don’t thrash them. After a size increase you must also grow the partition and filesystem inside the OS — the block device is bigger, but the filesystem doesn’t know until you tell it:

sudo growpart /dev/nvme0n1 1      # extend the partition to fill the device
sudo xfs_growfs -d /             # xfs: grow to the partition
# (ext4 equivalent: sudo resize2fs /dev/nvme0n1p1)

The Elastic Volumes operations, what is online, and the constraints:

Operation Online? Reversible? Constraint After-step required
Change type (gp2→gp3, gp3→io2) Yes Yes (with cooldown) 6 h between modifications None
Raise IOPS Yes Yes 6 h cooldown; optimizing state None
Raise throughput (gp3) Yes Yes Needs IOPS to back it None
Grow size Yes No (cannot shrink) 6 h cooldown growpart + xfs_growfs/resize2fs
Shrink size Not supported Must create new + copy Migrate data

The instance bandwidth ceiling

This is the section that saves the most money. A volume’s provisioned numbers are a maximum the volume can do — the instance imposes its own EBS bandwidth and IOPS limits, and those are usually lower. AWS publishes per-instance “EBS-optimized” limits: a baseline and a 30-minute burst (on smaller sizes), measured at a 16 KiB block size.

Concretely: an m6i.large tops out around 10,000 IOPS and 4,750 Mbps (~594 MiB/s) of dedicated EBS bandwidth. Attaching a single gp3 provisioned for 16,000 IOPS and 1,000 MiB/s to that instance is wasted spend — the instance caps you at roughly 60% of the throughput and 62% of the IOPS you paid for. The fix is to size the instance to the storage need, or aggregate volumes when the instance has headroom.

Check the limits before you provision the volume:

aws ec2 describe-instance-types \
  --instance-types m6i.large m6i.4xlarge \
  --query 'InstanceTypes[].{type:InstanceType, \
     baseIOPS:EbsInfo.EbsOptimizedInfo.BaselineIops, \
     burstIOPS:EbsInfo.EbsOptimizedInfo.MaximumIops, \
     baseMBps:EbsInfo.EbsOptimizedInfo.BaselineThroughputInMBps, \
     burstMBps:EbsInfo.EbsOptimizedInfo.MaximumThroughputInMBps}' \
  --output table

Smaller instances get an unlimited-duration baseline plus a burst bucket; the larger sizes in a family deliver their maximum continuously. If your workload is sustained (a busy database), size against the baseline, not the burst, or you will fall off a cliff after 30 minutes. On modern instances EBS optimization is on by default and not billable; on older types you may still need --ebs-optimized.

Representative instance EBS limits (general-purpose families)

These are the published per-instance EBS-optimized numbers for common sizes. Use describe-instance-types for the authoritative value in your Region/family — these illustrate the shape (baseline scales with size; smaller sizes burst):

Instance EBS baseline (Mbps) EBS baseline (MiB/s) Baseline IOPS Bursts? What a 1,000 MiB/s volume gets
m6i.large 4,750 ~594 10,000 Yes (30 min) ~594 MiB/s sustained (capped)
m6i.xlarge 6,000 ~750 20,000 Yes (30 min) ~750 MiB/s sustained (capped)
m6i.2xlarge 10,000 ~1,250 40,000 Yes (30 min) Full 1,000 MiB/s (headroom)
m6i.4xlarge 10,000 ~1,250 40,000 No (sustained) Full 1,000 MiB/s (headroom)
m6i.8xlarge 10,000 ~1,250 40,000 No Full 1,000 MiB/s
m6i.16xlarge 20,000 ~2,500 80,000 No Full + room to stripe
r6i.2xlarge 10,000 ~1,250 40,000 No Full 1,000 MiB/s
r5.2xlarge 4,750 ~594 up to 18,750 Yes (30 min) ~594 MiB/s (capped — the scenario)
c6i.4xlarge 10,000 ~1,250 40,000 No Full 1,000 MiB/s
m6i.metal / .32xlarge 40,000 ~5,000 100,000 No Stripe many volumes
r6id.32xlarge 80,000 ~10,000 260,000 No io2 Block Express headroom

A second reading of that table: the family sets the per-vCPU ratio, but the size sets whether you burst or run at the maximum continuously. Map a storage demand to the smallest instance that meets it on the baseline:

Sustained storage demand Smallest instance that meets baseline Don’t pick Why
≤ 600 MiB/s, ≤ 10k IOPS m6i.large (~594) — or one size up for margin smaller bursting size for a 24/7 DB Burst ends after 30 min
~750 MiB/s m6i.xlarge (~750) m6i.large (caps ~594) Volume would be throttled
~1,000 MiB/s m6i.2xlarge / .4xlarge (~1,250) anything ≤ m6i.xlarge Need headroom above 1,000
~2,000+ MiB/s (striped) m6i.16xlarge (~2,500) mid sizes Stripe needs instance headroom
> 4,000 MiB/s, > 100k IOPS .metal / r6id.32xlarge general sizes Only big sizes + io2 reach this

Baseline vs burst — the cliff that bites sustained workloads

The distinction that turns a passing benchmark into a 3am incident:

Aspect Baseline Burst
Duration Unlimited ~30 minutes per 24 h (credit-based)
Which sizes get burst Smaller sizes in a family Larger sizes deliver max continuously
Sustained DB workload Size against this Ignore — you’ll fall off after 30 min
Short batch / spiky Can lean on burst Fine within the credit window
Symptom of relying on burst Fast for 30 min, then throttled Latency spike exactly at the half-hour mark

Striping to beat the single-volume ceiling

When one instance has bandwidth headroom but a single volume’s per-volume ceiling is the limit, stripe. A RAID 0 across N gp3 volumes multiplies the volume ceilings — up to the instance limit:

# Two gp3 volumes, each provisioned for high throughput, striped
sudo mdadm --create /dev/md0 --level=0 --raid-devices=2 /dev/nvme1n1 /dev/nvme2n1
sudo mkfs.xfs /dev/md0
sudo mount /dev/md0 /data

RAID 0 gives no redundancy — rely on EBS’s own durability and snapshots, and know that a snapshot of a striped set is not crash-consistent across members unless you freeze the filesystem first. When striping helps and when it doesn’t:

Situation Stripe? Why
Need > 1,000 MiB/s, instance allows > that Yes Aggregate N gp3 volumes to instance limit
Need > 16,000 IOPS, but io2 too costly Sometimes N gp3 volumes can exceed one gp3’s IOPS
Instance baseline already the cap No Striping can’t exceed the instance ceiling
Single volume already meets demand No Adds complexity + risk for nothing
Need redundancy at the volume layer No (not RAID 0) RAID 0 = zero redundancy; rely on snapshots

Multi-Attach, Fast Snapshot Restore, and snapshot lifecycle

Multi-Attach lets a single io2 (or io1) volume attach to up to 16 Nitro instances in the same AZ concurrently. It is not a magic shared disk — it provides no coordination. You must run a cluster-aware filesystem (GFS2, OCFS2) or an application that arbitrates writes; mounting xfs/ext4 read-write on two instances corrupts the volume. Use it for clustered, fence-aware software, not as a poor man’s EFS.

Fast Snapshot Restore (FSR) removes the lazy-load penalty. Normally a volume restored from a snapshot loads blocks from S3 on first touch, so the first read of each block is slow. FSR pre-initializes the volume so it delivers full provisioned performance immediately — essential for golden-image boot volumes and for restoring large data volumes into service quickly. It is billed per AZ per hour while enabled.

aws ec2 enable-fast-snapshot-restores \
  --availability-zones us-east-1a us-east-1b \
  --source-snapshot-ids snap-0abc123

When to reach for each of these features

The three features here solve different problems and are mutually independent:

Feature Solves Use when Hard rule / limit Cost shape
Multi-Attach (io2/io1) One volume, many readers/writers Clustered, fence-aware software Up to 16 Nitro instances, same AZ; cluster FS only Volume cost only
Fast Snapshot Restore Slow first-touch after restore Golden images, time-critical DR restores Billed per AZ per hour while enabled Hourly per AZ + per snapshot
Data Lifecycle Manager Manual snapshot scripts Any scheduled backup + retention Policy-driven; tag-targeted No charge for DLM itself

Automating snapshots with Data Lifecycle Manager

Automate retention with Data Lifecycle Manager rather than cron jobs and Lambda glue. A policy that snapshots nightly, keeps 14, and copies to a DR Region:

{
  "ResourceTypes": ["VOLUME"],
  "TargetTags": [{ "Key": "Backup", "Value": "daily" }],
  "Schedules": [
    {
      "Name": "daily-14d",
      "CreateRule": { "Interval": 24, "IntervalUnit": "HOURS", "Times": ["03:00"] },
      "RetainRule": { "Count": 14 },
      "CopyTags": true,
      "CrossRegionCopyRules": [
        {
          "TargetRegion": "us-west-2",
          "Encrypted": true,
          "CmkArn": "arn:aws:kms:us-west-2:111122223333:key/abcd-1234",
          "RetainRule": { "Interval": 14, "IntervalUnit": "DAYS" }
        }
      ]
    }
  ]
}

EBS snapshots are incremental and block-level: only changed blocks since the last snapshot are stored, so frequent snapshots are cheap. Deleting an old snapshot never breaks a newer one — AWS re-references the blocks the newer snapshot still needs. The snapshot facts that govern cost and recovery:

Property Behaviour Implication
Incremental Only changed blocks since last snapshot stored Frequent snapshots are cheap
Deletion safety Newer snapshots keep blocks they need Deleting an old snapshot never breaks a newer one
First restore (no FSR) Blocks lazy-loaded from S3 on first touch First read is slow; not the steady-state number
FSR enabled Volume pre-initialized Full performance on first touch
Cross-Region copy Re-encrypts with target-Region CMK DR copies need a key in the target Region
Crash consistency (striped set) Not consistent across members unless frozen Freeze the filesystem before snapshotting a RAID set

EFS performance modes, throughput modes, and elastic throughput

EFS is NFSv4.1, multi-AZ, and grows automatically. Its performance is governed by two orthogonal settings that people routinely confuse.

Performance mode (set at creation, immutable):

Throughput mode (changeable, subject to a cooldown):

resource "aws_efs_file_system" "shared" {
  encrypted        = true
  performance_mode = "generalPurpose"
  throughput_mode  = "elastic"     # scales automatically, pay-per-use

  lifecycle_policy {
    transition_to_ia                    = "AFTER_30_DAYS"
    transition_to_primary_storage_class = "AFTER_1_ACCESS"
  }
}

Performance mode — the immutable choice

You set this once at creation and cannot change it later; choose deliberately:

Performance mode Latency Aggregate ceiling Choose when Cannot change later
General Purpose Lowest per-op High (paired with Elastic) Default; interactive, latency-sensitive, most workloads Correct
Max I/O Higher per-op Highest aggregate IOPS Legacy: massively parallel, latency-tolerant batch Correct

Throughput mode — the changeable choice

This you can change, but decreases and mode switches are rate-limited (roughly a day cooldown):

Throughput mode How throughput is set You pay for Best for Failure mode
Elastic Auto-scales with demand Data transferred (per GB) Spiky / unpredictable; the default Per-request cost on steady very-high load
Provisioned Fixed MiB/s you set Provisioned MiB/s (whether used or not) Steady, known high throughput on a small FS Paying for headroom you don’t use
Bursting Scales with stored data (50 KiB/s/GiB) + credits Storage only Large filesystems with bursty access Near-empty FS starves when credits run out

Choosing between the three modes is a function of size, access shape, and steadiness. This decision table resolves it:

If the filesystem is… And access is… Choose Why
Small (< 1 TiB) Spiky / unpredictable Elastic No baseline cliff; pay per GB
Small (< 1 TiB) Steady, known high throughput Provisioned Fixed MiB/s cheaper than per-GB at steady high load
Large (> 5 TiB) Bursty Bursting Baseline (50 KiB/s/GiB) is already large; cheapest
Any size Unknown / changing Elastic Safe default; auto-scales, no provisioning
Near-empty Anything Elastic (never Bursting) Bursting starves with almost no baseline
Very large, steady max Sustained ceiling Provisioned (if cheaper than Elastic per-GB) Compare metered Elastic cost vs flat Provisioned

Switching to Provisioned for a steady high-throughput job:

aws efs update-file-system \
  --file-system-id fs-0abc123 \
  --throughput-mode provisioned \
  --provisioned-throughput-in-mibps 256

Throughput-mode changes and decreases in provisioned throughput are rate-limited (you can raise it, but reducing it or switching modes has a cooldown of roughly a day), so don’t treat it as an autoscaling knob.

Why Bursting starves — the math

Bursting baseline is 50 KiB/s per GiB stored. A small filesystem has a tiny baseline and survives only on credits; once they’re gone it crawls. This table is the single most useful EFS diagnostic:

Stored data Baseline throughput Burst throughput (while credits last) Verdict on Bursting
100 GiB ~5 MiB/s ~100 MiB/s Starves fast; use Elastic
500 GiB ~25 MiB/s ~100 MiB/s Marginal; Elastic safer
1 TiB ~50 MiB/s ~100 MiB/s Workable if access is bursty
10 TiB ~500 MiB/s higher Bursting genuinely cheap and adequate
Empty / near-empty near zero drains immediately The classic “EFS is slow” ticket

EFS storage classes, lifecycle, and access points

EFS has Standard and Infrequent Access (IA) classes (plus One Zone variants for single-AZ cost savings). Lifecycle management moves files between Standard and IA based on access age; the transition_to_primary_storage_class = "AFTER_1_ACCESS" rule above promotes a file back to Standard the moment it is read again, which avoids the IA per-access read charge punishing hot files that aged out. For most shared filesystems IA cuts storage cost substantially with negligible behavioral change, because access is Pareto-distributed.

The EFS storage classes side by side

Storage class Durability scope $/GiB (relative) Access charge Use for
Standard Multi-AZ Baseline None Hot, frequently-read files
Standard-IA Multi-AZ ~Much lower Per-GB read fee Cold files in a multi-AZ FS
One Zone Single-AZ Lower than Standard None Reproducible / non-critical data
One Zone-IA Single-AZ Lowest Per-GB read fee Cold + reproducible

Lifecycle transition rules

The transition knobs and what each does:

Lifecycle setting Values Effect When to use
transition_to_ia AFTER_1/7/14/30/60/90_DAYS Demote untouched files to IA after N days Almost always; big storage savings
transition_to_primary_storage_class AFTER_1_ACCESS Promote a file back to Standard on read Avoid repeated IA read fees on re-hot files
(no lifecycle) Everything stays Standard Only if all data is uniformly hot

Access points are the right way to hand EFS to multiple applications or containers. Each enforces a POSIX identity and a root directory, so an app physically cannot see another tenant’s files:

resource "aws_efs_access_point" "app_a" {
  file_system_id = aws_efs_file_system.shared.id

  posix_user {
    uid = 1000
    gid = 1000
  }

  root_directory {
    path = "/app-a"
    creation_info {
      owner_uid   = 1000
      owner_gid   = 1000
      permissions = "0750"
    }
  }
}

Pair access points with a filesystem policy that requires TLS and IAM authorization, so a leaked mount target is useless without credentials:

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Deny",
    "Principal": { "AWS": "*" },
    "Action": "*",
    "Resource": "*",
    "Condition": { "Bool": { "aws:SecureTransport": "false" } }
  }]
}

Mount with the EFS helper so encryption-in-transit and the access point are wired correctly:

sudo mount -t efs -o tls,accesspoint=fsap-0abc123 fs-0abc123:/ /mnt/app-a

EFS mount options that affect performance and safety

The mount flags that matter, and what each buys:

Mount option What it does Default When to set
tls Encryption in transit via stunnel Off (helper adds it) Always in production
accesspoint=fsap-... Enforce POSIX root + identity None Multi-tenant / per-app isolation
iam Authenticate the mount with IAM Off When filesystem policy requires IAM
nconnect=N Multiple TCP connections per mount 1 Throughput-bound clients (raises parallelism)
noresvport Reconnect on a new port after a blip On (helper) Resilience across network events
_netdev (fstab) Wait for network before mount Boot-time mounts in /etc/fstab

Benchmarking with fio and interpreting results

Never trust the spec sheet — measure the path you actually run. fio is the tool. Match the block size and pattern to your workload: 16 KiB random for database-like I/O, large sequential for streaming.

Random read IOPS (database-style), with O_DIRECT to bypass the page cache so you measure the device, not RAM:

sudo fio --name=randread --filename=/data/fiotest --direct=1 \
  --rw=randread --bs=16k --iodepth=64 --numjobs=4 --group_reporting \
  --size=10G --runtime=120 --time_based --ioengine=libaio

Sequential throughput (analytics/log-streaming style):

sudo fio --name=seqread --filename=/data/fiotest --direct=1 \
  --rw=read --bs=1M --iodepth=32 --numjobs=2 --group_reporting \
  --size=20G --runtime=120 --time_based --ioengine=libaio

Match the fio profile to your real workload

The number one benchmarking error is testing a pattern the application never runs. Map the workload to the right block size, pattern, and queue depth before you draw any conclusion:

Workload Pattern (--rw) Block size (--bs) iodepth numjobs Limit it stresses
OLTP database (random reads) randread 8k–16k 32–64 4–8 IOPS
OLTP database (mixed) randrw (70/30) 8k–16k 32–64 4–8 IOPS + fsync latency
Log / stream ingestion (writes) write 1M 16–32 2–4 Throughput
Analytics scan (sequential reads) read 1M 32 2–4 Throughput
Boot / small mixed randrw 4k 16 1–2 Latency
Latency probe (single op) randread 4k 1 1 p50/p99 latency

The fio knobs and why each matters

Getting these wrong is how you “prove” a volume is slow when your test was the bottleneck:

fio flag What it controls Set it to If wrong you measure
--direct=1 Bypass the OS page cache (O_DIRECT) Always 1 for device tests RAM, not the volume
--bs Block size 4–16k random (DB); 1M sequential The wrong workload’s profile
--rw Pattern randread/randwrite/read/write/randrw A pattern your app never runs
--iodepth Outstanding I/Os per job Deep (32–64) to saturate Under-driven device (looks slow)
--numjobs Parallel worker threads Match cores / concurrency Single-threaded ceiling, not the volume’s
--runtime + --time_based Duration ≥ 120 s to ride past burst A burst window, not steady state
--ioengine I/O submission path libaio on Linux A slower engine’s overhead

Reading the output

What each fio number tells you and what to do next:

fio metric Compare against If you hit the volume number If you hit the instance number If you hit neither
IOPS min(vol IOPS, instance IOPS) Raise volume IOPS or stripe Resize the instance Deeper iodepth/numjobs; check FS/app
bw (MiB/s) min(vol throughput, instance EBS bw) Raise volume throughput or stripe Resize the instance Larger block size; more parallel jobs
clat/lat p50 gp3 ~single-digit ms; io2 sub-ms Expected; healthy n/a Investigate FS / fsync / network
clat/lat p99 Should track p50 under healthy load Queueing — lower iodepth Queueing at the instance cap Outliers — noisy neighbour / GC

A fresh volume restored from snapshot without FSR will read slow on first touch — that is lazy loading, not the steady-state number. Either enable FSR or pre-warm by reading every block before you benchmark.

Confirming the real limit end-to-end (CloudWatch)

Confirm the storage is performing to the limit that actually applies, end to end.

# 1. Confirm provisioned volume settings took effect
aws ec2 describe-volumes --volume-ids vol-0abc123 \
  --query 'Volumes[0].{type:VolumeType,size:Size,iops:Iops,throughput:Throughput,state:State}'

# 2. Confirm the instance's EBS ceiling (the real cap)
aws ec2 describe-instance-types --instance-types m6i.large \
  --query 'InstanceTypes[0].EbsInfo.EbsOptimizedInfo'

# 3. Measure actual achieved performance against CloudWatch
aws cloudwatch get-metric-statistics --namespace AWS/EBS \
  --metric-name VolumeReadOps --dimensions Name=VolumeId,Value=vol-0abc123 \
  --start-time "$(date -u -v-1H '+%Y-%m-%dT%H:%M:%SZ')" \
  --end-time "$(date -u '+%Y-%m-%dT%H:%M:%SZ')" \
  --period 300 --statistics Sum

# 4. Check whether the instance is throttling EBS (Nitro burst-balance / throughput)
#    A persistently low VolumeThroughputPercentage or exhausted BurstBalance == bottleneck found
aws cloudwatch get-metric-statistics --namespace AWS/EBS \
  --metric-name VolumeThroughputPercentage --dimensions Name=VolumeId,Value=vol-0abc123 \
  --start-time "$(date -u -v-1H '+%Y-%m-%dT%H:%M:%SZ')" \
  --end-time "$(date -u '+%Y-%m-%dT%H:%M:%SZ')" --period 300 --statistics Average

For EFS, confirm throughput mode and watch the burst/IO limit percentage:

aws efs describe-file-systems --file-system-id fs-0abc123 \
  --query 'FileSystems[0].{mode:ThroughputMode,prov:ProvisionedThroughputInMibps,perf:PerformanceMode}'

# PercentIOLimit near 100 on General Purpose means you should consider Elastic/Max I/O
aws cloudwatch get-metric-statistics --namespace AWS/EFS \
  --metric-name PercentIOLimit --dimensions Name=FileSystemId,Value=fs-0abc123 \
  --start-time "$(date -u -v-1H '+%Y-%m-%dT%H:%M:%SZ')" \
  --end-time "$(date -u '+%Y-%m-%dT%H:%M:%SZ')" --period 300 --statistics Maximum

The CloudWatch metrics that reveal each ceiling

This is the reference you keep open while diagnosing. Each metric points at exactly one limit:

Metric Namespace Near its limit means Confirms
VolumeReadOps / VolumeWriteOps AWS/EBS (rate) approaching provisioned IOPS Volume IOPS ceiling
VolumeThroughputPercentage AWS/EBS Low % despite load = throttled Instance EBS bandwidth cap
VolumeQueueLength AWS/EBS Persistently high = saturated/queued Device saturation or shallow concurrency
BurstBalance AWS/EBS Draining toward 0 (st1/gp2) Burst-credit starvation
VolumeReadBytes / VolumeWriteBytes AWS/EBS (rate) approaching provisioned throughput Volume throughput ceiling
PercentIOLimit AWS/EFS Near 100 on General Purpose EFS perf-mode ceiling → consider Elastic/Max I/O
BurstCreditBalance AWS/EFS Draining toward 0 EFS Bursting starvation
MeteredIOBytes AWS/EFS Tracks billed throughput EFS cost driver

Architecture at a glance

The diagram below traces a single I/O request from the application down to durable storage and shows where each ceiling sits. Read it left to right as the data path: the application issues reads and writes with a particular block size and queue depth; those land on the EC2 instance, whose Nitro EBS-optimized link has a published baseline and burst — the first ceiling, and the one nobody checks first. From the instance, block traffic crosses to the EBS volume (gp3 or io2 Block Express), which has its own per-volume IOPS and throughput ceiling, and file traffic goes to the EFS mount target over NFSv4.1/TLS, governed by the chosen throughput mode. Underneath, EBS snapshots in S3 and EFS lifecycle to IA form the durability and cost tier — and the snapshot path is where lazy-load latency hides on a fresh restore.

The badges mark the five places performance actually dies. Badge 1 sits on the instance link (instance EBS baseline caps you below the volume’s rated number); badge 2 on the gp3 volume (left at 3,000/125, or throughput bought without the IOPS to back it); badge 3 on io2 Block Express (the right call only above gp3’s ceiling); badge 4 on the EFS mount (Bursting on a near-empty filesystem starves); badge 5 on the snapshot restore (no FSR means the first read of every block fetches from S3). Follow the numbered legend to turn each badge into a symptom you can confirm with one CloudWatch metric and a fix you can apply with one CLI call. The governing rule the whole diagram teaches: achieved performance is min(instance, volume, filesystem), so the only move that helps is to raise the term that is actually binding.

EBS and EFS storage performance architecture: an application issues block and file I/O through an EC2 instance with a Nitro EBS-optimized link to gp3 and io2 Block Express volumes and an EFS mount target over NFS/TLS, backed by EBS snapshots in S3 with Fast Snapshot Restore and EFS lifecycle to Infrequent Access; five numbered badges mark the instance EBS baseline cap, the gp3 default-throttle and throughput-per-IOPS trap, the io2 over-provisioning choice, EFS Bursting credit starvation, and snapshot lazy-load on restore.

Real-world scenario

A fintech platform team — call them Aarna Pay — ran a PostgreSQL fleet on r5.2xlarge instances, each with a single 4 TiB gp3 volume provisioned to the full 16,000 IOPS and 1,000 MiB/s. Their batch reconciliation job — a heavy nightly read-write pass over the day’s settlement data — consistently flatlined at roughly 600 MiB/s no matter how high they pushed the volume’s provisioned throughput, and p99 query latency spiked into the seconds during the window. The on-call instinct was “buy more IOPS,” and they had, twice, with no effect except the spend going up. The reconciliation window kept growing past its SLA, threatening the morning settlement cut-off.

The constraint was the instance, not the volume. An r5.2xlarge delivers a baseline of about 593.75 MiB/s (4,750 Mbps) of EBS throughput — almost exactly the ceiling they kept hitting. VolumeThroughputPercentage sat low even at peak, the tell-tale of an instance-side throttle rather than a volume that’s maxed. The volume was provisioned 68% beyond anything the instance could ever consume; they were paying for 1,000 MiB/s and physically capped at ~594. A two-minute describe-instance-types would have shown it on day one.

Two changes fixed it. They moved the database to r6i.4xlarge, which delivers a sustained ~1,187.5 MiB/s baseline (and, being a larger size, no 30-minute burst cliff), and they migrated the hottest volumes to io2 Block Express for the latency floor under concurrent load. They also right-sized the volume’s provisioned throughput down to match the new instance baseline, recovering the over-provisioning spend. They codified the rule so it can’t regress: provisioned volume throughput must never exceed the instance’s published EBS baseline.

# Guardrail: cap provisioned throughput at the instance's EBS baseline.
# Fetch the instance EBS baseline at plan time and clamp the volume to it.
data "aws_ec2_instance_type" "db" {
  instance_type = "r6i.4xlarge"
}

locals {
  instance_ebs_baseline_mibps = data.aws_ec2_instance_type.db.ebs_optimized_info[0].baseline_throughput_in_mbps
}

resource "aws_ebs_volume" "pg_data" {
  availability_zone = "us-east-1a"
  size              = 4096
  type              = "io2"
  iops              = 64000
  # Provisioning beyond the instance baseline is wasted money; clamp it.
  throughput        = min(1000, local.instance_ebs_baseline_mibps)
  encrypted         = true
}

The reconciliation window dropped from 50 minutes to 22, p99 latency fell back under 10 ms, and the monthly storage bill went down because the over-provisioned IOPS were trimmed. The lesson the team internalized: storage performance is min(volume, instance), and the instance limit is the one nobody checks first. The before/after, with the metric that proved each step:

Phase Instance Volume config Achieved throughput p99 latency Proof metric
Before r5.2xlarge gp3 16,000/1,000 ~600 MiB/s (capped) seconds VolumeThroughputPercentage low
“Buy more IOPS” r5.2xlarge gp3 16,000/1,000 (again) ~600 MiB/s (unchanged) seconds No change — wrong knob
Resize instance r6i.4xlarge gp3 16,000/1,000 ~1,000 MiB/s < 50 ms VolumeThroughputPercentage healthy
Migrate + right-size r6i.4xlarge io2 64,000, throughput clamped ~1,000 MiB/s < 10 ms p99 under SLA; bill down

Advantages and disadvantages

The decoupled, software-defined storage model both enables precise tuning and invites the over-provisioning mistakes this article exists to prevent. Weigh it honestly:

Advantages (why this model helps you) Disadvantages (why it bites)
IOPS, throughput, and capacity are independent purchases — pay for exactly the shape you need Three knobs means three ways to mis-size; conflating them overspends and under-provisions at once
Elastic Volumes change type/IOPS/throughput online — no downtime to tune The 6-hour cooldown and optimizing state mean you can’t thrash changes during an incident
The instance EBS limit is published and queryable (describe-instance-types) It’s invisible by default — the volume reports its full number while the instance silently caps you
EFS Elastic throughput removes provisioning and cliffs entirely Bursting (the cheap mode) starves a near-empty filesystem — the #1 EFS complaint
Snapshots are incremental and cheap; DLM automates retention + DR copy A fresh restore is lazy-loaded — slow first touch unless you pay for FSR
RAID 0 striping beats the single-volume ceiling up to the instance limit RAID 0 has zero redundancy and breaks crash-consistency of snapshots unless you freeze the FS
io2 Block Express delivers sub-ms latency and huge ceilings Easy to over-reach for — many gp3 workloads land on io2 at 3–5× the cost for unused headroom
Everything is measurable with fio + CloudWatch A bad fio config (shallow iodepth, page cache on) “proves” a volume is slow when the test was the cap

The model is right for any workload where you want to size storage to measured demand rather than buy a fixed appliance. It rewards teams who measure (fio + CloudWatch) and codify guardrails (clamp throughput to the instance baseline); it punishes muscle-memory sizing — picking io2 by reflex, leaving gp3 at defaults, mounting EFS on Bursting, or benchmarking a lazy-loaded restore. The disadvantages are all knowable and measurable — which is the entire point of treating storage as min(volume, instance, app) and finding the binding term before you spend.

Hands-on lab

Provision a gp3 volume, prove it’s throttled at the default, measure it with fio, raise the knobs online, and confirm the gain — all on one small instance you delete at the end. Run from a session on an Amazon Linux 2023 EC2 instance (a t3.large or m6i.large is fine and cheap).

Step 1 — Variables and a gp3 volume at the default (3,000 / 125).

AZ=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)
IID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
VOL=$(aws ec2 create-volume --availability-zone "$AZ" --size 100 \
  --volume-type gp3 --encrypted \
  --query VolumeId --output text)
echo "Volume: $VOL in $AZ on $IID"

Expected: a vol-... id. At this point IOPS=3,000 and throughput=125 MiB/s (the defaults).

Step 2 — Attach, format, mount.

aws ec2 wait volume-available --volume-ids "$VOL"
aws ec2 attach-volume --volume-id "$VOL" --instance-id "$IID" --device /dev/sdf
sleep 5
DEV=$(lsblk -o NAME,SERIAL | grep "${VOL#vol-}" | awk '{print "/dev/"$1}')
sudo mkfs.xfs "$DEV" && sudo mkdir -p /data && sudo mount "$DEV" /data

Expected: /data mounted on the new device. (On Nitro the device appears as /dev/nvme*, hence the serial lookup.)

Step 3 — Benchmark at the default and record the ceiling.

sudo fio --name=base --filename=/data/fiotest --direct=1 --rw=randread \
  --bs=16k --iodepth=64 --numjobs=4 --group_reporting \
  --size=5G --runtime=60 --time_based --ioengine=libaio | grep -E 'IOPS|BW'

Expected: IOPS pinned near 3,000 and bandwidth near 125 MiB/s — the gp3 baseline, regardless of how deep you drive it. This is the throttle the default imposes.

Step 4 — Raise IOPS and throughput online with Elastic Volumes.

aws ec2 modify-volume --volume-id "$VOL" --iops 8000 --throughput 500
# Wait until the modification leaves 'modifying'/'optimizing'
aws ec2 describe-volumes-modifications --volume-id "$VOL" \
  --query 'VolumesModifications[0].[ModificationState,Progress]' --output text

Expected: state progresses modifyingoptimizingcompleted. The volume stays mounted and usable throughout.

Step 5 — Re-benchmark and confirm the gain.

sudo fio --name=tuned --filename=/data/fiotest --direct=1 --rw=randread \
  --bs=16k --iodepth=64 --numjobs=4 --group_reporting \
  --size=5G --runtime=60 --time_based --ioengine=libaio | grep -E 'IOPS|BW'

Expected: IOPS now climbs toward 8,000 and bandwidth toward 500 MiB/sprovided the instance’s EBS limit allows it. On an m6i.large (~594 MiB/s baseline) you’ll see the throughput land near the volume number; on a smaller instance you’ll hit the instance cap first — which is exactly the lesson.

Step 6 — Prove the instance ceiling is real.

TYPE=$(curl -s http://169.254.169.254/latest/meta-data/instance-type)
aws ec2 describe-instance-types --instance-types "$TYPE" \
  --query 'InstanceTypes[0].EbsInfo.EbsOptimizedInfo.{baseMBps:BaselineThroughputInMBps,baseIOPS:BaselineIops}' \
  --output table

Expected: the baseline MiB/s and IOPS the instance allows. Compare to your fio bandwidth: if fio matched this number rather than the volume’s 500, you just observed min(volume, instance) with your own eyes.

Validation checklist. You provisioned gp3 at the default and saw it throttle at 3,000/125; raised IOPS/throughput online with zero downtime; re-measured a real gain; and confirmed the instance EBS baseline is a separate, often-lower ceiling. The steps mapped to what each proves:

Step What you did What it proves Real-world analogue
3 Benchmark gp3 at default The 3,000/125 default is a real throttle “Why is my new volume slow?”
4 modify-volume online Tuning needs no detach/downtime Right-sizing a live production volume
5 Re-benchmark tuned Raising the knobs actually helps The fix after the diagnosis
6 describe-instance-types The instance is a separate ceiling The bill full of unreachable numbers

Cleanup (avoid lingering volume + snapshot charges).

sudo umount /data
aws ec2 detach-volume --volume-id "$VOL"
aws ec2 wait volume-available --volume-ids "$VOL"
aws ec2 delete-volume --volume-id "$VOL"

Cost note. A 100 GiB gp3 volume for an hour is a few rupees; the provisioned IOPS/throughput above baseline add a little more while modified. Deleting the volume stops all of it. (There is no free-tier gp3 with provisioned IOPS, but an hour of this lab is well under ₹50.)

Common mistakes & troubleshooting

Before the playbook, the error and status reference — the exact strings, states, and API errors you’ll see, what each means, and the immediate move. These are the messages that surface from the CLI, the volume state machine, and the OS when storage tuning goes wrong:

String / state / error Where it appears Meaning Immediate move
VolumeModificationRateExceeded modify-volume API Modified within the last 6 hours Wait for the 6-hour cooldown
Volume state optimizing describe-volumes-modifications Modify applied; perf between old/new Wait it out; do not re-modify
InvalidParameterValue: throughput too high for iops modify-volume / create-volume gp3 0.25 MiB/s-per-IOPS rule violated Raise IOPS first (1,000 MiB/s ⇒ ≥ 4,000)
iops ... exceeds the ratio create-volume (io2) IOPS > 1,000 × GiB Increase size or lower IOPS
VolumeInUse delete-volume / attach-volume Still attached (or attaching elsewhere) Detach first; check Multi-Attach
IncorrectState: available detach-volume Already detached No action; it’s free
xfs ... corruption / EXT4-fs error (dmesg) OS kernel log Single-writer FS on Multi-Attach, or bad RAID Use a cluster FS; fsck offline
No space left on device after grow OS Grew volume, not the filesystem growpart + xfs_growfs/resize2fs
mount.nfs4: Connection timed out (EFS) OS mount Security group / mount target / no tls helper Open 2049; use amazon-efs-utils
BurstBalance at 0 (alarm) CloudWatch (EBS) st1/gp2 burst credits exhausted Size up; or provisioned gp3
BurstCreditBalance at 0 (alarm) CloudWatch (EFS) EFS Bursting starved Switch to Elastic throughput
PercentIOLimit ≈ 100 (alarm) CloudWatch (EFS) General Purpose IOPS ceiling hit Move to Elastic (or Max I/O legacy)

This is the playbook — the part you bookmark. First as a scannable table you can read mid-incident, then the same entries with the full confirm-command detail underneath.

# Symptom Root cause Confirm (exact cmd / metric) Fix
1 Throughput flatlines well below the volume’s number Instance EBS baseline is the cap describe-instance-types ... EbsOptimizedInfo; VolumeThroughputPercentage low Resize instance to a larger size/family
2 New gp3 volume “slow” at 3,000 IOPS / 125 MiB/s Left at the default; never provisioned up describe-volumes Iops=3000, Throughput=125 modify-volume --iops --throughput
3 Raised throughput but it didn’t increase Not enough provisioned IOPS to back it (gp3 0.25 MiB/s/IOPS) Provisioned IOPS < throughput/0.25 Raise IOPS first (1,000 MiB/s needs ≥ 4,000)
4 EFS crawls; small filesystem Bursting mode + near-empty FS out of credits BurstCreditBalance → 0; ThroughputMode=bursting Switch to Elastic throughput mode
5 Restored DR volume reads at a fraction of rated speed Snapshot lazy-load (no FSR) First-read latency >> steady state Enable FSR or pre-warm by reading all blocks
6 fio shows low IOPS despite headroom iodepth/numjobs too shallow; single-threaded Raise iodepth/numjobs → IOPS rises Deepen queue; parallelize the workload
7 fio numbers absurdly high, then production slow Page cache not bypassed (no O_DIRECT) --direct=1 collapses the number to real Always benchmark with --direct=1
8 Modification “stuck”; performance between old/new optimizing state after modify describe-volumes-modifications = optimizing Wait it out; don’t re-modify (6 h cooldown)
9 “Modify failed: too soon” Modified within the last 6 hours Last modification < 6 h ago Wait for the 6-hour cooldown
10 Grew the volume but the filesystem is still small Didn’t grow partition/FS inside the OS lsblk device big, df -h FS small growpart + xfs_growfs/resize2fs
11 Two instances mounted one volume; corruption Plain xfs/ext4 RW on a Multi-Attach volume Filesystem errors in dmesg Use a cluster FS (GFS2/OCFS2) or don’t multi-attach
12 st1 fast then slow under sustained reads Throughput burst credits exhausted BurstBalance draining to 0 Size up; or move to provisioned gp3
13 EFS PercentIOLimit pegged at ~100% General Purpose perf-mode IOPS ceiling PercentIOLimit near 100 Move to Elastic throughput (or Max I/O legacy)
14 Latency p99 spikes at the 30-minute mark Relied on instance EBS burst, not baseline Throttle begins exactly after ~30 min Size against the baseline, larger instance

The expanded form, with the full reasoning for the entries that bite hardest:

1. Throughput flatlines well below the volume’s provisioned number. Root cause: The instance EBS baseline is lower than the volume’s ceiling — the classic, most expensive mistake. Confirm: aws ec2 describe-instance-types --instance-types <type> --query 'InstanceTypes[0].EbsInfo.EbsOptimizedInfo'; CloudWatch VolumeThroughputPercentage sits low even at peak (a volume that’s truly maxed reads ~100%). Fix: Resize the instance to a larger size or family whose baseline ≥ your target; never provision volume throughput past the instance baseline for sustained work.

2. A brand-new gp3 volume is “slow” — capped at 3,000 IOPS / 125 MiB/s. Root cause: gp3 ships at the baseline default; provisioning above it is opt-in and was never done. Confirm: aws ec2 describe-volumes --volume-ids <vol> --query 'Volumes[0].{iops:Iops,tput:Throughput}' returns 3000 / 125. Fix: aws ec2 modify-volume --volume-id <vol> --iops <n> --throughput <m> (online).

3. You raised throughput but achieved bandwidth didn’t move. Root cause: gp3 enforces ≤ 0.25 MiB/s per provisioned IOPS — you bought MiB/s without the IOPS to back it. Confirm: provisioned IOPS < target throughput / 0.25 (e.g. asking 1,000 MiB/s with only 3,000 IOPS). Fix: Raise IOPS first — 1,000 MiB/s requires ≥ 4,000 provisioned IOPS — then the throughput is allowed.

4. EFS crawls and it’s a small filesystem. Root cause: Bursting throughput mode on a near-empty filesystem (50 KiB/s per GiB baseline) that has exhausted its burst credits. Confirm: CloudWatch BurstCreditBalance trending to zero; aws efs describe-file-systems --query 'FileSystems[0].ThroughputMode' returns bursting. Fix: aws efs update-file-system --throughput-mode elastic — throughput then scales with demand, no credit cliff.

5. A volume restored from a snapshot reads at a fraction of its rated speed. Root cause: Lazy loading — blocks fetch from S3 on first touch; you’re measuring S3 latency, not the volume. Confirm: the first read of each region is slow and the second is fast; steady-state matches the spec after a full pass. Fix: Enable Fast Snapshot Restore on the snapshot in the target AZs, or pre-warm by reading every block (dd if=/dev/nvmeXn1 of=/dev/null bs=1M).

6. fio reports low IOPS even though the volume and instance have headroom. Root cause: Too-shallow queue depth or single-threaded I/O — the device is under-driven, not slow. Confirm: raising --iodepth and --numjobs increases IOPS; at iodepth=1 you measure latency-bound, not the device ceiling. Fix: Drive a deeper queue (32–64) and more jobs that match real concurrency; fix single-threaded application I/O.

7. fio shows impossibly high numbers, but production is slow. Root cause: The benchmark hit the page cache (RAM), not the device — --direct=1 was missing. Confirm: adding --direct=1 drops the number to a believable device figure. Fix: Always benchmark device performance with --direct=1 (O_DIRECT).

8 & 9. Modification seems stuck, or “modify failed: too soon.” Root cause: After a modify the volume enters optimizing (performance between old and new); and a volume can be modified only once per 6 hours. Confirm: aws ec2 describe-volumes-modifications --volume-id <vol> shows optimizing; a second modify inside 6 h is rejected. Fix: Wait out optimizing; plan changes so you don’t need a second modify inside the 6-hour window.

10. You grew the volume but the filesystem is still the old size. Root cause: Growing the EBS volume enlarges the block device, not the partition/filesystem inside the OS. Confirm: lsblk shows the larger device; df -h shows the old filesystem size. Fix: sudo growpart /dev/nvme0n1 1 then sudo xfs_growfs -d /mount (xfs) or sudo resize2fs /dev/nvme0n1p1 (ext4).

11. Two instances mounted one volume and it corrupted. Root cause: A Multi-Attach io2 volume mounted xfs/ext4 read-write on more than one instance — those filesystems assume single-writer. Confirm: filesystem inconsistency errors in dmesg/journal on both nodes. Fix: Use a cluster-aware filesystem (GFS2/OCFS2) with proper fencing, or don’t multi-attach a single-writer filesystem.

12. st1 is fast initially, then slows under sustained reads. Root cause: st1’s throughput burst credits are exhausted; you’ve dropped to the baseline. Confirm: CloudWatch BurstBalance draining toward 0. Fix: Size the st1 volume larger (baseline scales with size), or switch to a provisioned gp3 if latency matters.

13. EFS PercentIOLimit is pegged near 100%. Root cause: You’ve hit the General Purpose performance-mode IOPS ceiling. Confirm: CloudWatch PercentIOLimit at ~100 sustained. Fix: Move to Elastic throughput (raises the effective ceiling for most workloads); Max I/O is the legacy alternative but costs latency and is immutable.

14. Latency p99 spikes right at the 30-minute mark. Root cause: The workload leaned on the instance’s EBS burst rather than the baseline; the burst window ended. Confirm: throttling begins ~30 minutes into sustained load; the instance is a smaller size that bursts. Fix: Size against the baseline — choose a larger instance whose baseline meets sustained demand.

Best practices

The alarms worth wiring before the next incident, and why each is leading rather than lagging:

Alarm on Namespace / metric Threshold (starting point) Why it’s leading
Instance EBS throttle AWS/EBS VolumeThroughputPercentage < 100% while load is high, 10 min Catches instance-bound before “it’s slow” tickets
EBS burst starvation AWS/EBS BurstBalance < 20% and falling Predicts the st1/gp2 throttle cliff
Volume saturation AWS/EBS VolumeQueueLength Sustained high (> 1 per provisioned 500 IOPS) I/O queuing before latency blows up
EFS credit starvation AWS/EFS BurstCreditBalance Trending to 0 The near-empty-Bursting failure, pre-emptively
EFS IOPS ceiling AWS/EFS PercentIOLimit > 90% sustained Perf-mode ceiling before throughput collapses
EFS cost creep AWS/EFS MeteredIOBytes Above budget baseline Elastic per-GB charges climbing

Security notes

The controls that secure storage, what each defends against, and the performance cost:

Control Mechanism Secures against Performance cost
EBS encryption at rest encrypted=true + KMS CMK Disk/snapshot data theft Negligible (Nitro offload)
EFS encryption in transit tls mount + deny non-TLS policy Network sniffing of NFS Minimal (stunnel)
EFS access points POSIX root + identity per app Cross-tenant file access None
EFS IAM auth iam mount + filesystem policy Leaked mount target without creds None
Snapshot sharing controls CreateVolumePermission audit Public/wrong-account data leak None
Destructive-action IAM scoping Tag-conditioned Detach/Modify/Delete Accidental/malicious volume wipe None
Cross-Region copy key grants KMS key policy / grants Unreadable or failed DR copies None

Cost & sizing

The bill drivers and how they interact with the tuning decisions:

A rough monthly picture for a mid-tier production database volume and a shared filesystem: a 4 TiB gp3 at the default baseline is a few thousand rupees; raising it to 8,000 IOPS / 500 MiB/s adds a modest IOPS+throughput charge; the same workload on io2 at 64,000 IOPS is several times that. A 1 TiB EFS on Standard with lifecycle to IA can cut storage cost by more than half versus all-Standard. The cost drivers and what each one buys you:

Cost driver What you pay for Rough INR / month (illustrative) What it fixes Watch-out
gp3 capacity (per GiB) Storage, baseline 3,000/125 included ~₹7–8 per GiB → 4 TiB ≈ ₹30,000 Baseline performance for free Capacity ≠ performance; size data, not air
gp3 provisioned IOPS IOPS above 3,000 Small per-IOPS-month above baseline Random-small headroom Buying IOPS the instance can’t consume
gp3 provisioned throughput MiB/s above 125 Small per-MiB/s-month above baseline Sequential headroom Needs IOPS to back it (0.25 rule)
io2 capacity + IOPS Higher per-GiB + per-IOPS Several× gp3 for the same shape Sub-ms latency, > 16,000 IOPS Over-reached for gp3 workloads
EFS Standard storage Per-GiB-month, multi-AZ Higher than EBS per-GiB Shared, multi-AZ file access All-Standard when IA would do
EFS lifecycle to IA Cheaper per-GiB on cold files Cuts storage cost > 50% typically Cold-data cost IA read fee on files that go hot
EFS Elastic throughput Per-GB transferred Scales with use Spiky workloads, no cliffs Steady very-high load can cost more
FSR Per AZ per hour while enabled Hourly per AZ Fast first-touch restore Leaving it on idle burns money

Interview & exam questions

1. A volume is provisioned for 1,000 MiB/s but the workload flatlines at ~594 MiB/s. What’s happening and how do you confirm? The instance EBS baseline is the cap, not the volume. An m6i.large/r5.2xlarge delivers ~4,750 Mbps (~594 MiB/s) of EBS bandwidth; the volume’s number is unreachable on that instance. Confirm with describe-instance-types ... EbsOptimizedInfo and a low VolumeThroughputPercentage. Fix by resizing the instance, not buying more volume.

2. How do gp3 and io2 differ from gp2 in how you provision performance? On gp2, IOPS were coupled to size (3 IOPS/GiB), so you oversized to buy performance. gp3 and io2 decouple capacity, IOPS, and throughput into independent purchases. gp3 baseline is 3,000 IOPS / 125 MiB/s, tunable to 16,000 / 1,000; io2 Block Express reaches 256,000 IOPS / 4,000 MiB/s. You size three dimensions separately.

3. On gp3, you raise throughput to 1,000 MiB/s but it won’t take effect. Why? gp3 enforces a maximum of 0.25 MiB/s per provisioned IOPS, so 1,000 MiB/s requires at least 4,000 provisioned IOPS. If you’re still at the 3,000 baseline, the throughput request is bounded. Raise IOPS to ≥ 4,000 first, then the throughput is allowed.

4. When do you choose io2 Block Express over gp3? Only when you need what gp3 can’t give: sustained IOPS above 16,000, single-digit-millisecond (sub-ms) p99 latency under concurrency, 99.999% durability, or volumes larger than 16 TiB. Otherwise gp3 serves the same workload at a fraction of the cost — picking io2 by reflex pays 3–5× for unused headroom.

5. Why does an EFS filesystem with little data crawl, and how do you fix it? It’s on Bursting throughput mode, whose baseline is 50 KiB/s per GiB stored — a near-empty filesystem has almost no baseline and survives only on burst credits, which then run out. Confirm with BurstCreditBalance draining to zero. Fix by switching to Elastic throughput, which scales with demand and has no credit cliff.

6. What is Fast Snapshot Restore and when is it essential? Normally a volume restored from a snapshot lazy-loads blocks from S3 on first touch, so the first read of each block is slow. FSR pre-initializes the volume so it delivers full provisioned performance immediately. It’s essential for golden-image boot volumes and time-critical DR restores, and it’s billed per AZ per hour while enabled.

7. Difference between EFS performance mode and throughput mode? Performance mode (General Purpose vs Max I/O, set at creation, immutable) trades per-operation latency against aggregate IOPS ceiling. Throughput mode (Elastic, Provisioned, Bursting, changeable with a cooldown) governs how much aggregate throughput you get and how you pay. People confuse them; they’re orthogonal — one is latency-vs-ceiling, the other is throughput-vs-cost.

8. You restored a DR volume and it benchmarks at a tenth of its rated speed. Is the volume broken? No — you’re measuring lazy loading (S3 fetch on first touch), not steady state. The second read of each block is fast. Either enable FSR before relying on the volume, or pre-warm by reading every block (dd ... of=/dev/null) so the benchmark reflects the device, not S3 latency.

9. When does RAID 0 striping help EBS performance, and what’s the catch? Striping aggregates N volumes’ per-volume ceilings, useful when a single volume’s IOPS/throughput ceiling is the limit and the instance has bandwidth headroom above it. The catch: RAID 0 has zero redundancy (rely on EBS durability + snapshots), and striping cannot exceed the instance EBS limit — if the instance is already the cap, striping buys nothing.

10. Your fio test shows great numbers but production is slow. What’s the likely test error? The benchmark probably hit the page cache (RAM) instead of the device — --direct=1 (O_DIRECT) was missing. Or iodepth/numjobs were too shallow and under-drove the device. Re-run with --direct=1, a workload-matched block size, and a deep enough queue, then compare against min(volume, instance).

11. Why size a sustained workload against the instance baseline rather than the burst? Smaller instances get a higher EBS bandwidth for a 30-minute burst, then fall back to the baseline. A sustained database that leaned on burst is fast in a short test and throttles exactly at the half-hour mark in production. Size against the baseline; choose a larger size/family if the baseline doesn’t meet sustained demand.

12. You can only attach an io2 volume to one instance — except when? And what’s the constraint? Multi-Attach lets an io2/io1 volume attach to up to 16 Nitro instances in the same AZ. The hard constraint: it provides no write coordination, so you must run a cluster-aware filesystem (GFS2/OCFS2) or an application that arbitrates writes. Mounting plain xfs/ext4 read-write on two instances corrupts the volume.

These map to AWS Certified Solutions Architect – Associate (SAA-C03)design cost-optimized and high-performing storage — and AWS Certified SysOps Administrator – Associate (SOA-C02)monitor and tune EBS/EFS, CloudWatch storage metrics. The deep performance-tuning angle (instance limits, io2 Block Express, striping) also appears on the Solutions Architect – Professional (SAP-C02). A compact cert-mapping for revision:

Question theme Primary cert Exam objective area
Volume type selection by workload SAA-C03 Design high-performing & cost-optimized storage
Instance EBS limit vs volume limit SAP-C02 / SOA-C02 Performance tuning; monitoring
gp3 decoupling + 0.25 ratio SAA-C03 Storage performance fundamentals
EFS performance/throughput modes SAA-C03 Design file storage solutions
Snapshots, FSR, DLM, DR copy SOA-C02 Backup, recovery, automation
CloudWatch storage metrics SOA-C02 Monitor, log, and remediate

Quick check

  1. A gp3 volume is provisioned for 16,000 IOPS / 1,000 MiB/s, but the workload never exceeds ~594 MiB/s. Where is the bottleneck, and what one command confirms it?
  2. You raise a gp3 volume’s throughput to 1,000 MiB/s but it won’t apply while IOPS sits at 3,000. What rule are you hitting, and what do you change?
  3. True or false: switching an EFS filesystem from Bursting to Elastic throughput is the right fix for a small, near-empty filesystem that keeps running out of throughput.
  4. A volume restored from a snapshot benchmarks at a fraction of its rated speed. Name the cause and two ways to fix it.
  5. Your fio random-read test reports numbers far above the volume’s provisioned IOPS. What single flag is almost certainly missing, and what were you actually measuring?

Answers

  1. The instance EBS baseline is the cap — a volume’s provisioned number is unreachable if the instance can’t push it (e.g. ~594 MiB/s on an m6i.large/r5.2xlarge). Confirm with aws ec2 describe-instance-types --instance-types <type> --query 'InstanceTypes[0].EbsInfo.EbsOptimizedInfo', and note VolumeThroughputPercentage sitting low. Fix by resizing the instance, not the volume.
  2. The gp3 throughput-per-IOPS ratio — you can buy at most 0.25 MiB/s per provisioned IOPS, so 1,000 MiB/s needs ≥ 4,000 IOPS. Raise IOPS to at least 4,000 first; then the throughput change is allowed.
  3. True. Bursting’s baseline is 50 KiB/s per GiB stored, so a near-empty filesystem starves once burst credits run out. Elastic throughput scales with demand and removes the credit cliff — the correct fix.
  4. The cause is snapshot lazy loading — blocks fetch from S3 on first touch, so you’re measuring S3 latency, not the device. Fix by (a) enabling Fast Snapshot Restore on the snapshot in the target AZs, or (b) pre-warming by reading every block (dd if=/dev/nvmeXn1 of=/dev/null bs=1M) before benchmarking.
  5. --direct=1 (O_DIRECT) is missing — you were measuring the page cache (RAM), not the EBS device. Re-run with --direct=1 (and a deep enough iodepth/numjobs) to measure the real device, then compare against min(volume, instance).

Glossary

Next steps

You can now size block and file storage to the limit that actually binds, and confirm it with fio and CloudWatch. Build outward:

awsebsefsstorageperformance
Need this built for real?

Vinod is a Senior Cloud Architect (22+ yrs) — available for Azure / AWS / GCP architecture, landing zones, and migrations.

Work with me

Comments