Containerization Containers

Mastering Kubernetes Storage with CSI: Volume Snapshots, Cloning, Online Resize, and Topology-Aware Provisioning

Most teams stop learning CSI the moment a PVC binds. That is a mistake. The interesting half of the Container Storage Interface — snapshots, cloning, online expansion, and topology-aware placement — is exactly the half you reach for during an incident, a migration, or a 2 a.m. restore. This guide walks the full feature set with real manifests, the sidecars that make each feature work, and the failure modes that actually page you. Examples assume a cluster on Kubernetes 1.27+ with a modern CSI driver (the EBS, Disk CSI, GCE PD, and Ceph drivers all behave the same way here).

1. The CSI architecture you actually need to understand

A CSI driver is not one process. It is a node-level plugin (a DaemonSet that mounts volumes onto the host) plus a controller deployment that wraps the vendor’s CSI gRPC server with a set of Kubernetes-aware sidecar containers. Each sidecar watches one kind of object and translates it into a CSI call. You should know which sidecar owns which feature, because when a feature silently does nothing, the answer is almost always “that sidecar isn’t deployed or isn’t permitted.”

Sidecar Watches CSI calls it drives Feature it enables
external-provisioner PVC / PV CreateVolume, DeleteVolume Dynamic provisioning
external-attacher VolumeAttachment ControllerPublishVolume Attach/detach to nodes
external-snapshotter VolumeSnapshotContent CreateSnapshot, DeleteSnapshot Snapshots
external-resizer PVC (spec change) ControllerExpandVolume Online/offline resize
node-driver-registrar (none) NodeGetInfo registration Kubelet plugin registration
livenessprobe (none) Probe Health endpoint

The mental model: the controller sidecars run as a Deployment (often leader-elected, replica 2+), and they only make controller-plane calls to your cloud’s storage API. The node side does the actual NodeStageVolume / NodePublishVolume mount work and is where filesystem resize physically happens. Snapshotter and resizer are optional — a driver can ship without them, which is the first thing to check before you debug for an hour.

# Which sidecars is your driver actually running?
kubectl -n kube-system get deploy,daemonset -l app.kubernetes.io/name=aws-ebs-csi-driver
kubectl -n kube-system get pod -l app=ebs-csi-controller -o jsonpath='{.items[0].spec.containers[*].name}'
# Expect: ebs-plugin csi-provisioner csi-attacher csi-snapshotter csi-resizer liveness-probe

2. Install the snapshot CRDs and controller (they ship outside core Kubernetes)

This trips up nearly everyone. VolumeSnapshot, VolumeSnapshotContent, and VolumeSnapshotClass are not part of core Kubernetes. They live in the external-snapshotter project and consist of two pieces you must install yourself unless your managed control plane already did it:

  1. The three CRDs.
  2. The snapshot-controller — a cluster-wide controller (one per cluster) that handles the common, vendor-independent snapshot logic and binds VolumeSnapshot to VolumeSnapshotContent. This is distinct from the per-driver csi-snapshotter sidecar.
# Pin to a release tag — never apply from a moving branch in production.
SNAP_VERSION=v8.2.0
BASE=https://raw.githubusercontent.com/kubernetes-csi/external-snapshotter/${SNAP_VERSION}

# 1. CRDs
kubectl apply -f ${BASE}/client/config/crd/snapshot.storage.k8s.io_volumesnapshotclasses.yaml
kubectl apply -f ${BASE}/client/config/crd/snapshot.storage.k8s.io_volumesnapshotcontents.yaml
kubectl apply -f ${BASE}/client/config/crd/snapshot.storage.k8s.io_volumesnapshots.yaml

# 2. The shared snapshot-controller (RBAC + Deployment)
kubectl apply -f ${BASE}/deploy/kubernetes/snapshot-controller/rbac-snapshot-controller.yaml
kubectl apply -f ${BASE}/deploy/kubernetes/snapshot-controller/setup-snapshot-controller.yaml

Managed clusters differ. EKS requires you to install both CRDs and controller (the EBS CSI add-on ships only the sidecar). AKS and GKE install the controller and CRDs for you on recent versions. Run kubectl get crd | grep snapshot before assuming anything. Mismatched CRD apiVersion between controller and sidecar is a classic cause of snapshots that stay readyToUse: false forever.

3. Define a VolumeSnapshotClass and take an application-consistent snapshot

A VolumeSnapshotClass is to snapshots what a StorageClass is to volumes: it names the driver and sets a deletion policy. Retain keeps the underlying cloud snapshot when the Kubernetes object is deleted; Delete garbage-collects it.

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: ebs-snapclass
driver: ebs.csi.aws.com
deletionPolicy: Retain
parameters:
  # Driver-specific. EBS tags the snapshot; useful for cost allocation and Velero.
  tagSpecification_1: "Name=k8s-csi-snapshot"

Now the crisp distinction that matters in production: CSI does not freeze your application. A snapshot is crash-consistent — equivalent to pulling the power cord. For a database that is usually recoverable via WAL replay, but “usually” is not a backup strategy. For application consistency you quiesce the workload around the snapshot. The dependable pattern is a brief flush-and-lock:

# Application-consistent snapshot of a Postgres PVC.
# 1. Checkpoint so the on-disk state is current, then snapshot immediately.
kubectl exec -it postgres-0 -- psql -U postgres -c "CHECKPOINT;"

cat <<'EOF' | kubectl apply -f -
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: postgres-snap-2026-05-30
  namespace: data
spec:
  volumeSnapshotClassName: ebs-snapclass
  source:
    persistentVolumeClaimName: data-postgres-0
EOF

For stricter guarantees, use a brief filesystem freeze (fsfreeze -f / -u) or the engine’s hot-backup mode (pg_backup_start / pg_backup_stop) bracketing the kubectl apply. The snapshot call returns in milliseconds — the cloud copy-on-write happens afterward — so the lock window is short. Watch the object until it reports ready:

kubectl -n data get volumesnapshot postgres-snap-2026-05-30 \
  -o jsonpath='{.status.readyToUse} {.status.restoreSize}{"\n"}'
# true 20Gi

4. Restore a PVC from a snapshot, and clone a live volume

Restore and clone are the same mechanism: a dataSource on a fresh PVC. The provisioner sees the reference and calls CreateVolume with a source instead of provisioning empty.

Restore from a snapshot — point dataSource at the VolumeSnapshot:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-postgres-restored
  namespace: data
spec:
  storageClassName: ebs-sc
  dataSource:
    name: postgres-snap-2026-05-30
    kind: VolumeSnapshot
    apiGroup: snapshot.storage.k8s.io
  accessModes: ["ReadWriteOnce"]
  resources:
    requests:
      storage: 20Gi   # must be >= snapshot restoreSize

Clone a live PVC — point dataSource at the source PersistentVolumeClaim (no snapshot needed):

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-postgres-clone
  namespace: data
spec:
  storageClassName: ebs-sc
  dataSource:
    name: data-postgres-0
    kind: PersistentVolumeClaim
  accessModes: ["ReadWriteOnce"]
  resources:
    requests:
      storage: 20Gi

Two hard constraints that will bite you: the source and destination PVC must use the same StorageClass and (for most drivers) the same volume binding topology — you cannot clone an us-east-1a volume into a PVC that resolves to us-east-1b. And cloning copies live blocks, so the clone is crash-consistent against an active writer. Quiesce the source if you need the clone to be coherent.

5. Enable allowVolumeExpansion and resize online

Set one field on the StorageClass and you unlock expansion. Note: this is one-way — you can grow a volume, never shrink it.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-sc
provisioner: ebs.csi.aws.com
allowVolumeExpansion: true        # the line that enables resize
volumeBindingMode: WaitForFirstConsumer
parameters:
  type: gp3
  csi.storage.k8s.io/fstype: ext4
reclaimPolicy: Delete

To resize, edit the PVC’s spec.resources.requests.storage upward. The external-resizer calls ControllerExpandVolume to grow the backing disk, then the node plugin grows the filesystem. Online resize (no pod restart) is supported by modern drivers on ext4 and xfs when the ExpandInUsePersistentVolumes feature is active — which it is by default on supported versions.

# Grow from 20Gi to 50Gi, in place, pod stays running.
kubectl -n data patch pvc data-postgres-0 --type merge \
  -p '{"spec":{"resources":{"requests":{"storage":"50Gi"}}}}'

# Watch the two-phase progression via conditions.
kubectl -n data get pvc data-postgres-0 \
  -o jsonpath='{.status.capacity.storage} | {range .status.conditions[*]}{.type}={.status} {end}{"\n"}'
# 20Gi | Resizing=True
# ...then...
# 50Gi |   (conditions cleared, capacity updated)

If you see the condition FileSystemResizePending, the cloud disk grew but the node-side filesystem grow hasn’t completed — for older drivers this clears on the next pod restart. Most current drivers do it live.

6. Topology-aware provisioning: keep volumes and pods in the same zone

Block storage in the cloud is zonal. An EBS volume in us-east-1a cannot attach to a node in us-east-1b, full stop. If the scheduler places your pod in 1b but the provisioner already cut the volume in 1a, the pod is wedged forever. The fix has two parts.

volumeBindingMode: WaitForFirstConsumer (shown above) is the critical one. It tells the provisioner not to create the volume until a pod is scheduled, so the volume is cut in the zone the scheduler actually picked. The default Immediate mode provisions eagerly and is the single most common cause of unschedulable stateful pods across zones. Use WaitForFirstConsumer for any zonal block storage.

allowedTopologies narrows the candidate zones — useful when you must keep volumes out of a zone (capacity, compliance, or a paired-AZ requirement):

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-sc-zoned
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
parameters:
  type: gp3
allowedTopologies:
  - matchLabelExpressions:
      - key: topology.ebs.csi.aws.com/zone
        values:
          - us-east-1a
          - us-east-1b

The topology key is driver-specific (topology.ebs.csi.aws.com/zone, topology.gke.io/zone, etc.) — copy it from the driver’s CSINode object, do not assume topology.kubernetes.io/zone. For a StatefulSet that must spread one replica per zone, pair this StorageClass with topologySpreadConstraints on the pod template so the scheduler distributes replicas and the provisioner follows.

7. Back it up properly: Velero with CSI snapshot data movement

Native CSI snapshots are fast but they usually live in the same account and region as the source disk — that is not disaster recovery, it is a fast undo. Velero closes the gap. Its CSI support takes a VolumeSnapshot and, with the data mover (GA since Velero 1.14), copies the snapshot’s contents to object storage in another region via a Kopia/Restic uploader, decoupling your backup from the cloud snapshot’s lifecycle.

velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.10.0 \
  --bucket velero-prod-backups \
  --backup-location-config region=us-west-2 \
  --use-node-agent \
  --features=EnableCSI \
  --snapshot-location-config region=us-east-1

# Back up a namespace and MOVE snapshot data to the bucket (cross-region durable).
velero backup create data-2026-05-30 \
  --include-namespaces data \
  --snapshot-move-data \
  --wait

--snapshot-move-data is the flag that turns an in-region CSI snapshot into a portable, cross-region backup object. Restores then provision fresh PVCs from that data, which also lets you restore into a different cluster or region — the real test of any backup.

Verify

Run these end-to-end before you trust the setup. Each line proves one feature actually works rather than merely being configured.

# Snapshot CRDs + controller present
kubectl get crd | grep snapshot.storage.k8s.io        # 3 CRDs
kubectl -n kube-system get deploy snapshot-controller  # 1/1 ready

# Snapshot reaches readyToUse
kubectl -n data get volumesnapshot -o wide

# Restored PVC binds and the cloud reports the right size
kubectl -n data get pvc data-postgres-restored          # STATUS=Bound

# Expansion actually grew the filesystem inside the pod
kubectl -n data exec postgres-0 -- df -h /var/lib/postgresql/data  # shows 50G

# Topology landed the volume in the pod's zone (they MUST match)
kubectl get pv -o custom-columns=\
'NAME:.metadata.name,ZONE:.spec.nodeAffinity.required.nodeSelectorTerms[0].matchExpressions[0].values[0]'
kubectl get pod postgres-0 -o jsonpath='{.spec.nodeName}{"\n"}'

If a snapshot is stuck on readyToUse: false, describe its VolumeSnapshotContent and read the status.error — that is where the driver’s real message surfaces, not on the VolumeSnapshot.

Enterprise scenario

A fintech platform team ran a 40-node multi-tenant Postgres fleet on EKS across three AZs, provisioned by the EBS CSI driver with the default Immediate binding mode inherited from an old StorageClass. It worked until a regional capacity event in us-east-1a forced the cluster autoscaler to bring up replacement nodes only in 1b and 1c. New StatefulSet replicas scheduled onto the surviving zones — but the provisioner, in Immediate mode, had already cut their PVCs in 1a minutes earlier, so the volumes could not attach. Roughly a third of the fleet’s restarting pods wedged in Pending with node(s) had volume node affinity conflict, and the team could not fail over.

The root cause was binding mode, not capacity. They switched the StorageClass to WaitForFirstConsumer so the volume is provisioned only after the scheduler commits a pod to a node, guaranteeing zone co-location, and constrained placement with allowedTopologies plus a per-zone spread constraint. They also moved their hourly snapshots to Velero with --snapshot-move-data into us-west-2, so a zone or region event no longer stranded both the data and its backup in the same place.

# The one-line change that prevents cross-zone attach deadlock.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ebs-sc-zoned
provisioner: ebs.csi.aws.com
volumeBindingMode: WaitForFirstConsumer   # was: Immediate
allowVolumeExpansion: true
parameters:
  type: gp3

Because StorageClass fields are immutable, the migration was create-new-class plus rolling replacement of the StatefulSets onto it — not an in-place edit. Plan that window.

Troubleshooting: stuck attachments, finalizers, and orphans

These are the recurring failure modes, in order of how often they page.

Checklist

kubernetescsistoragesnapshotsstateful

Comments

Keep Reading